Dec 312009

I have to admit that the previous post on Christmas is actually not much fun. Today I received another pResent from Yixuan which is more interesting:

Dec 242009

Life should be fun. I saw a post in R-help list saying Merry Christmas to other useRs, and I followed up by some R code which can produce a naive animation like this:

Here is the code to generate the above Flash animation with shining Christmas:

library(animation)
saveSWF({
    n = length(speed <- runif(angle <- runif(x <- strsplit("MERRY CHRISTMAS",
        "")[[1]], 0, 360), 0, 15))
    for (j in 1:300) {
        angle = angle + speed
        plot.new()
        plot.window(c(1, n), c(0, 1))
        for (i in 1:n) text(i, 0.5, x[i], srt = angle[i], cex = runif(1,
            1, 4), col = sample(colors(), 1))
        text(n, 0, "Yihui @ 2009-12-24 (http://yihui.name)",
            adj = c(1, 0), col = "white", cex = 0.8)
    }
}, interval = 0.04, dev = "pdf", outdir = getwd(), para = list(mar = rep(0,
    4), bg = "black"), width = 8, height = 1)
## in animation package (>=1.1-0), see demo('Xmas')

There are other animation formats in the R package animation:

  1. use saveMovie() to get a GIF animation (need ImageMagick)
  2. ani.start() and ani.stop() can produce an HTML page with the animation in it
  3. saveLatex() can embed an animation into a PDF document
Nov 112009

Since animation 1.0-9, we will be able to create a PDF document with an animation embedded in it; the function is saveLatex(), and its usage is similar to saveMovie() and saveSWF(): you pass an R expression for creating animations to this function, and this expression will be evaluated in the function; the image frames get recorded by a graphics device. In the end, a LaTeX document is written in a directory, and we can get a PDF document by running pdflatex on the document.

In fact, the key point is the LaTeX package named animate, which can be used to insert image frames into a PDF document to generate an animation. The interface of animations created by this package is quite similar to the HTML animation page by the R package animation, moreover, it also uses JavaScript (in PDF) to animate the image frames.

Oct 192009

I love R because there are always exciting new packages which can be far beyond your imagination. Here I’d like to introduce a couple of packages that look really awesome:

1. swfDevice: R graphics device for SWF output (by Cameron Bracken)

This package is still at a pre-alpha stage but you can see a sketch now in R-Forge: https://r-forge.r-project.org/projects/swfdevice/

Its author, Cameron, certainly knows well that I will be excited to see it, because I’ve been waiting for a long long time for the REAL Flash animation output in R. What I’ve done in my animation package is simply using SWF Tools to combine several “static” pictures (PNG or PDF, …) into a naive Flash animation — by “naive” I mean there is no interaction or real dynamic stuff in the Flash animation. Hopefully Cameron will provide a useful tool to create genuine Flash animations directly from R (with the help of the library libming).

By the way, I have to mention that the tikzDevice package by Cameron and another author is also fantastic for generating high-quality graphics LaTeX.

Oct 102009

Today Romain Francois posted an interesting topic in the R-help list, and you can read his blog post for more details: celebrating R commit #50000. 50000 is certainly not a small number; we do owe R core members a big “thank you” for their great efforts in this fantastic statistical language in the 13 years. When I saw Romain’s data, I suddenly remembered a question I asked to one of Prof Ripley’s student a couple of years ago: does Prof Ripley ever sleep? And he answered “No!”. No wonder we can see Prof Ripley so frequently in the R-help/devel mailing list. If you have stayed on R-help list for enough long time, you’ll surely know several facts, e.g. Martin Maechler will arrive in less than 3 minutes if you dare call an R package “library”, and you will get “Ripleyed” if you are not careful enough in posting your R code.

> library(fortunes)
> fortune("Ripleyed")

And the fear of getting Ripleyed on the mailing list also makes me think, read,
and improve before submitting half baked questions to the list.
 -- Eric Kort
 R-help (January 2006)
Sep 262009

As Sir Francis Bacon said, “Histories make men wise; poets witty; the mathematics subtile; natural philosophy deep; moral grave; logic and rhetoric able to contend.” And Windows stupid.

He should have added the last sentence if he were a Windows user in this age.

1. Avoid Using M$ Excel

A lot of R users often ask this question: “How to import MS Excel data into R?” Well, my suggestion is, avoid using M$ Excel if you are a statistician (or going to be a statistician) because you just cannot imagine how messy Excel data can be: some cells might be merged, some are colored, some texts are bold, several data tables can be put everywhere (e.g. cell(1,1) to (10,4), and (17,3) to (25,9)), stupid bar plots and pie charts are inserted in the sheets, silly statistical procedures that are wrong forever… If you don’t trust my words (yes, I’m a nobody), just read the examples here: Problems with Excel (collected by Prof Harrell).

I know there are reasons for you to continue using Excel. Your boss required you to do so; you don’t have time to learn more about various data formats; everybody is using Excel, and you don’t want to be so cool to use R; or if you finish your tasks too quickly and accurately, your boss will doubt whether you have really spent time on working, hence you will get less money paid (this is a REAL story for me – though I didn’t get less payment, I was indeed doubted when I used R); …

Aug 312009

Yanping Chen raised a question in the Chinese COS forum on the output of Eviews: how to (re)format the decimal coefficients in equations as text output? For example, we want to round the numbers in CC = 16.5547557654 + 0.0173022117998*PP + 0.216234040485 * PP(-1) + 0.810182697599 * (WP + WG) to the 3rd decimal places. This can be simply done by regular expressions, as decimals always begin with a “.”. The basic steps are:

  1. find out where are the decimals in the character string;
  2. format them;
  3. replace the original decimals with formatted values;

Given a character vector, we can format the decimals with the code below:

Aug 292009
Coin-flipping is a rather old topic in probability theory, so most of us think we know very well about it, however, the other day I saw a question about this old topic (in David Smith’s REvolution?) which was beyond me expectation: how many times do we need to flip the coin until we get a sequence of HTH and HTT respectively? (For example, for the sequence HHTH, the number for HTH to appear is 4, and in THTHTT, the number for HTT is 6.)

It seems that the two results are equivalent, as H and T occurs with equal probability 0.5, so we naturally believe the average numbers of steps to HTH and HTT are the same, but the fact is not as we imagined.

## smart guys use math formulae to solve the problem,
## but *lazy* guys like me use simulations with R
coin.seq = function(v) {
    x = NULL
    n = 0
    while (!identical(x, v)) {
        x = append(x[length(x) - 1:0], rbinom(1, 1, 0.5))
        n = n + 1
    }
    n
}
set.seed(919)
mean(htt <- replicate(1e+05, coin.seq(c(1, 0, 0))))
# [1] 8.00304
mean(hth <- replicate(1e+05, coin.seq(c(1, 0, 1))))
# [1] 10.0062

png("coin-htt-hth.png", height = 150, width = 500)
par(mar = c(3, 2.5, 0.1, 0), mgp = c(2, 0.8, 0))
boxplot(list(HTT = htt, HTH = hth), horizontal = T,
    xlab = "n", ylim = range(boxplot(list(HTT = htt, HTH = hth),
        plot = FALSE)$stats))
points(c(mean(htt), mean(hth)), 1:2, pch = 19)
dev.off()

The answer is counterintuitive, isn’t it?

Number of times needed to get HTT and HTH

Number of times needed to get HTT and HTH (bold segments are for median; dots denote mean)

Well, mathematicians certainly do not like my solution (I guess they even hate such an imprecise approach). I hope some smart guys can give me some hints on working out the probability distribution and hence the expectation.

Jun 122009
Linlin Yan posted a cool (hot?) simulation of burning fire with R in the COS forum yesterday, which was indeed a warm welcome. I’m not sure whether our forum members will be scared by the “fire” under the title “Welcome to COS Forum”. :grin: The fire was mainly created by the function image() with carefully designed rows and columns in heated colors heat.colors(). Here is one of the pictures generated from his code:

Simulation of Burning Fire in R

Simulation of Burning Fire in R

Jun 102009
Tag cloud is a bunch of words drawn in a graph with their sizes proportional to their frequency; it’s widely used in blogs to visualize tags. We can observe important words quickly from a tag cloud, as they often appear in large fontsize. Tony N. Brown asked how to “graphically represent frequency of words in a speech” the other day in R-help list, which is actually a problem about the tag cloud:

I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]

Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.

1. Arranging text labels with pointLabel()

Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:

Simulated Tag Cloud with R function pointLabel() in maptools

Simulated Tag Cloud with R function pointLabel() in maptools

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie