Aug 142010
Update: now the long hints for function parameters can be broken into several shorter lines.

Auto-completion is fancy in a text editor. Notepad++ does not support auto-completion for the R language, so I spent a couple of hours on creating such an XML file to support R:

Download: R.xml (938Kb)

Put it under ‘plugins/APIs‘ in the installation directory of Notepad++ (you can see several other XML files there supporting different languages such as C), and make sure you have enabled auto-completion in Notepad++ (Settings --> Preferences --> Backup/Auto-completion). Open an R script and start typing a familiar function (e.g. paste()), you will see some candidates in a drop-down list like this:

Show parameters of R functions in Notepad++

Show parameters of R functions in Notepad++

Hit the Enter key if the function name selected in the list is correct for you, then type ‘(‘ and you will see hints for parameters:

Auto-completion in Notepad++ for R script

Auto-completion in Notepad++ for R script

The file R.xml was actually generated from R; it contains almost all visible R objects in base R packages as well as recommended packages like MASS. You may create an extended XML file (containing keywords from other packages) by yourself after loading the packages you need into your current workspace, and run:

source('http://yihui.name/en/wp-content/uploads/2010/08/Npp_R_Auto_Completion.r')
# R.xml will be generated under your current work directory: getwd()
Apr 152010

I came across this blog post just now: The Next Big Thing, and of course these words caught my attention:

[...] However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.

I don’t really understand how (much more?) difficult will it be to install and maintain R. Usually it takes about one minute to install it from the binary (and SAS? SPSS? buy it, find a technician, install it, maintain according to different licenses – single PC or server or other types, continue to pay only tens of thousand dollars next year, …). For learning, it depends. I don’t think it is too difficult for people who know well about statistics, and for the rest of people, do they really feel safe to do something they do not understand? For the documentation, some people prefer simple ones and some prefer handbooks (of SAS-style).

In all, I cannot see why R is an epic fail for the above reasons…

What? Data visualization?…

The R community must have been tired of comparing SAS with R. Please don’t tell Prof Frank Harrell about this post…

Mar 282010

When we want to call external programs in R under Windows, we often need to know the paths of these programs. For instance, we may want to know where ImageMagick is installed, as we need the convert (convert.exe) utility to convert images to other formats, or where OpenBUGS is installed because we need this path to use the function bugs(). Usually this problem does not exist under Linux, because the executables (or their symbolic links) are often put in the directories which are in the environment variable PATH (e.g. /usr/bin, /usr/local/bin).

However, we may be able to find the paths through the registry if the installation will save the path info in the registry hive. The R function is readRegistry():

## ImageMagick:
## I used this trick in the function saveMovie (the animation package)
> readRegistry("SOFTWARE\\ImageMagick\\Current")
$BinPath
[1] "C:\\Program Files\\ImageMagick"
$CoderModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\coders"
$ConfigurePath
[1] "C:\\Program Files\\ImageMagick\\config"
$FilterModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\filters"
$LibPath
[1] "C:\\Program Files\\ImageMagick"
$QuantumDepth
[1] 16
$Version
[1] "6.3.8"

## OpenBUGS
> r = names(readRegistry("Software\\Microsoft\\Windows\\ShellNoRoam\\MUICache",
+    "HCU"))
> dirname(r[grep("OpenBUGS\\.exe", r)])
[1] "C:/Program Files/OpenBUGS"

There is no guarantee for this approach to work on any Windows platforms, but I think this is better than explaining what is the PATH variable to some Windows users…

Mar 242010
Amber Watkins gave me a suggestion on the animation for the ratio estimation, and I think this is a good topic for my animation package. I’ve finished writing the initial version of the function sample.ratio() for this package, which will appear in the version 1.1-2 a couple of days later.

As we know, the benefit of ratio estimation is that sampling skewness may be adjusted for, because the estimation of \bar{Y} will make use of the information in the relationship of X and Y: \bar{X} \cdot (\bar{y}/\bar{x}). Here is a demo (we can see the ratio estimate, denoted by the red line, generally performs better than \bar{y}):

An animation demo for the ratio estimation

An animation demo for the ratio estimation

Feb 182010

For a long time I’ve been wondering why we are not able to use Enter in the LyX Scrap environment which was set up by Gregor Gorjanc for Sweave. Two weeks ago, I (finally!) could not help asking Gregor about this issue, as I’m using “LyX + Sweave” more and more in my daily work. He explained it here: LyX-Sweave: mandatory use of control+enter in code chunks

After digging into the LyX customization manual for a while, I found a solution which allows us to press the Enter key just as we normally do when typing in a LyX document. The key is to use Environment instead of paragraph as LatexType for the style definition of Scrap. Besides, I used the LatexName as wrapsweave, as a LatexName is required by LyX. The definition for wrapsweave is simple: just two empty lines by \par. (If you define it as \newenvironment{wrapsweave}{}{}, you will run into troubles sometimes; especially when you use indent for paragraphs.)

As we know, LaTeX environment cannot be centered in LyX (only paragraphs can), so I defined a special environment ScrapCenter when I want to insert graphics via Sweave and make them center-aligned.

Dec 312009

I have to admit that the previous post on Christmas is actually not much fun. Today I received another pResent from Yixuan which is more interesting:

Dec 242009

Life should be fun. I saw a post in R-help list saying Merry Christmas to other useRs, and I followed up by some R code which can produce a naive animation like this:

Here is the code to generate the above Flash animation with shining Christmas:

library(animation)
saveSWF({
    n = length(speed <- runif(angle <- runif(x <- strsplit("MERRY CHRISTMAS",
        "")[[1]], 0, 360), 0, 15))
    for (j in 1:300) {
        angle = angle + speed
        plot.new()
        plot.window(c(1, n), c(0, 1))
        for (i in 1:n) text(i, 0.5, x[i], srt = angle[i], cex = runif(1,
            1, 4), col = sample(colors(), 1))
        text(n, 0, "Yihui @ 2009-12-24 (http://yihui.name)",
            adj = c(1, 0), col = "white", cex = 0.8)
    }
}, interval = 0.04, dev = "pdf", outdir = getwd(), para = list(mar = rep(0,
    4), bg = "black"), width = 8, height = 1)
## in animation package (>=1.1-0), see demo('Xmas')

There are other animation formats in the R package animation:

  1. use saveMovie() to get a GIF animation (need ImageMagick)
  2. ani.start() and ani.stop() can produce an HTML page with the animation in it
  3. saveLatex() can embed an animation into a PDF document
Nov 112009

Since animation 1.0-9, we will be able to create a PDF document with an animation embedded in it; the function is saveLatex(), and its usage is similar to saveMovie() and saveSWF(): you pass an R expression for creating animations to this function, and this expression will be evaluated in the function; the image frames get recorded by a graphics device. In the end, a LaTeX document is written in a directory, and we can get a PDF document by running pdflatex on the document.

In fact, the key point is the LaTeX package named animate, which can be used to insert image frames into a PDF document to generate an animation. The interface of animations created by this package is quite similar to the HTML animation page by the R package animation, moreover, it also uses JavaScript (in PDF) to animate the image frames.

Oct 102009

Today Romain Francois posted an interesting topic in the R-help list, and you can read his blog post for more details: celebrating R commit #50000. 50000 is certainly not a small number; we do owe R core members a big “thank you” for their great efforts in this fantastic statistical language in the 13 years. When I saw Romain’s data, I suddenly remembered a question I asked to one of Prof Ripley’s student a couple of years ago: does Prof Ripley ever sleep? And he answered “No!”. No wonder we can see Prof Ripley so frequently in the R-help/devel mailing list. If you have stayed on R-help list for enough long time, you’ll surely know several facts, e.g. Martin Maechler will arrive in less than 3 minutes if you dare call an R package “library”, and you will get “Ripleyed” if you are not careful enough in posting your R code.

> library(fortunes)
> fortune("Ripleyed")

And the fear of getting Ripleyed on the mailing list also makes me think, read,
and improve before submitting half baked questions to the list.
 -- Eric Kort
 R-help (January 2006)
Sep 262009

As Sir Francis Bacon said, “Histories make men wise; poets witty; the mathematics subtile[1]; natural philosophy deep; moral grave; logic and rhetoric able to contend.” And Windows stupid.

He should have added the last sentence if he were a Windows user in this age.

1. Avoid Using M$ Excel

A lot of R users often ask this question: “How to import MS Excel data into R?” Well, my suggestion is, avoid using M$ Excel if you are a statistician (or going to be a statistician) because you just cannot imagine how messy Excel data can be: some cells might be merged, some are colored, some texts are bold, several data tables can be put everywhere (e.g. cell(1,1) to (10,4), and (17,3) to (25,9)), stupid bar plots and pie charts are inserted in the sheets, silly statistical procedures that are wrong forever… If you don’t trust my words (yes, I’m a nobody), just read the examples here: Problems with Excel (collected by Prof Harrell).

I know there are reasons for you to continue using Excel. Your boss required you to do so; you don’t have time to learn more about various data formats; everybody is using Excel, and you don’t want to be so cool to use R; or if you finish your tasks too quickly and accurately, your boss will doubt whether you have really spent time on working, hence you will get less money paid (this is a REAL story for me – though I didn’t get less payment, I was indeed doubted when I used R); …

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie