R Language
R is a free software environment for statistical computing and graphics.
Auto-completion is fancy in a text editor. Notepad++ does not support auto-completion for the R language, so I spent a couple of hours on creating such an XML file to support R:
Download: R.xml (938Kb)Put it under ‘plugins/APIs‘ in the installation directory of Notepad++ (you can see several other XML files there supporting different languages such as C), and make sure you have enabled auto-completion in Notepad++ (Settings --> Preferences --> Backup/Auto-completion). Open an R script and start typing a familiar function (e.g. paste()), you will see some candidates in a drop-down list like this:
Hit the Enter key if the function name selected in the list is correct for you, then type ‘(‘ and you will see hints for parameters:
The file R.xml was actually generated from R; it contains almost all visible R objects in base R packages as well as recommended packages like MASS. You may create an extended XML file (containing keywords from other packages) by yourself after loading the packages you need into your current workspace, and run:
source('http://yihui.name/en/wp-content/uploads/2010/08/Npp_R_Auto_Completion.r')
# R.xml will be generated under your current work directory: getwd()
As every useR knows, the useR! 2010 conference is being held at NIST in Gaithersburg these days. I have just finished my talk on the R package animation this afternoon. Here are my slides and R code for those who are interested:
Download: Slides (1.6M), and R code (3.6K); Note you may need Acrobat Reader to watch the animations inside the slides.Have fun, even if you are a PhD!
chull() which can generate (indices of) the convex hull for a series of points. Now we can use the R package alphahull to compute the α-convex hull. For those who are not familiar with the α-convex hull, the animation below might be a good illustration for the difference between a convex hull and an α-convex hull. Note how the parameter α affects the shape of the hull:
The above animation can be reproduced with the code below (uncomment the lines to create a GIF animation with the animation package):
I came across this blog post just now: The Next Big Thing, and of course these words caught my attention:
[...] However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.
I don’t really understand how (much more?) difficult will it be to install and maintain R. Usually it takes about one minute to install it from the binary (and SAS? SPSS? buy it, find a technician, install it, maintain according to different licenses – single PC or server or other types, continue to pay only tens of thousand dollars next year, …). For learning, it depends. I don’t think it is too difficult for people who know well about statistics, and for the rest of people, do they really feel safe to do something they do not understand? For the documentation, some people prefer simple ones and some prefer handbooks (of SAS-style).
In all, I cannot see why R is an epic fail for the above reasons…
What? Data visualization?…
The R community must have been tired of comparing SAS with R. Please don’t tell Prof Frank Harrell about this post…
It is not uncommon to see messy R code which is almost not human-readable like this:
# rotation of the word "Animation"
# in a loop; change the angle and color
# step by step
for (i in 1:360) {
# redraw the plot again and again
plot(1,ann=FALSE,type="n",axes=FALSE)
# rotate; use rainbow() colors
text(1,1,"Animation",srt=i,col=rainbow(360)[i],cex=7*i/360)
# pause for a while
Sys.sleep(0.01)}
Apparently it is pain reading unformatted R code, but on the other hand, it is natural for us to be lazy. I don’t care about adding spaces or indent to my raw R code — I’ll concentrate on programming first and format my code later. The R package ‘formatR‘ is intended to help us format our messy R code. Two lines of R code will show you the graphical interface of formatR:
# formatR depends on RGtk+, will be installed automatically
# please use the latest version of R (>=2.10.1)
install.packages('formatR')
library(formatR)
# or formatR()
Then you can either paste your code into the text box or click the “Open” button to open an existing R code file. Click the “Convert” button and you are done!
There are several options in the “Preferences” panel, e.g. you can specify whether to keep comments or blank lines, or specify the width of the formatted R code.
No matter how messy your code looks like, formatR can make it tidy and structured as long as there are no syntax errors in your R code. If you prefer the command line interface, you may want to take a look at the function tidy.source() in the animation package.
Currently there are problems with the encoding of multi-byte characters, and I have not figured out how to deal with them.
Here is my personal list of rules of thumb for people who want to meet some R gurus (quickly) in the R help mailing list (R-help@R-project.org):
- If you want to meet Dr Bill Venables, just say something about Type III Sum of Squares (better if you also mention the “unbeatable” SAS);
- If you want to meet Prof Douglas Bates, say something about LSMEANS (of course, with SAS) and P-values for the fixed effects in
lmer()(or wait in the mixed-models groupr-sig-mixed-models@r-project.org— he often shows up there); - If you want to meet Prof Frank Harrell Jr, say SAS is unbeatable (or efficient, golden-standard, high-quality graphics, whatever);
- If you want to meet Dr Martin Mächler, say something like “I need help on a library called ***” (it is said that he would show up in 5 mins upon such mistakes, but I feel he is tired of correcting people who don’t know the difference between a “package” and a “library” now);
- If you want to meet Prof Brian Ripley (the-professor-on-whom-the-sun-never-sets), well, I guess you can say anything, because he is so devoted to the mailing list that you can see him a.e., but you have to be careful enough not to be “Ripleyed”;
I’ve been reading the mailing list for about 2 years, so I may not know enough about all the gurus. Let me know if I missed anyone. The above list is not given for serious purpose, and my real point is I learned a lot from their advice and arguments.
We know the real distribution of the F statistic in linear models — it is a non-central F distribution. Under H0, we have a central F distribution. Given 1 – α, we can compute the probability of (correctly) rejecting H0. I created a simple demo to illustrate how the power changes as other parameters vary, e.g. the degrees of freedoms, the non-central parameter and alpha. Here is the video:
And for those who might be interested, here is the code (you need to install the gWidgets package first and I recommend the RGtk2 interface). Have fun:
When we want to call external programs in R under Windows, we often need to know the paths of these programs. For instance, we may want to know where ImageMagick is installed, as we need the convert (convert.exe) utility to convert images to other formats, or where OpenBUGS is installed because we need this path to use the function bugs(). Usually this problem does not exist under Linux, because the executables (or their symbolic links) are often put in the directories which are in the environment variable PATH (e.g. /usr/bin, /usr/local/bin).
However, we may be able to find the paths through the registry if the installation will save the path info in the registry hive. The R function is readRegistry():
## ImageMagick:
## I used this trick in the function saveMovie (the animation package)
> readRegistry("SOFTWARE\\ImageMagick\\Current")
$BinPath
[1] "C:\\Program Files\\ImageMagick"
$CoderModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\coders"
$ConfigurePath
[1] "C:\\Program Files\\ImageMagick\\config"
$FilterModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\filters"
$LibPath
[1] "C:\\Program Files\\ImageMagick"
$QuantumDepth
[1] 16
$Version
[1] "6.3.8"
## OpenBUGS
> r = names(readRegistry("Software\\Microsoft\\Windows\\ShellNoRoam\\MUICache",
+ "HCU"))
> dirname(r[grep("OpenBUGS\\.exe", r)])
[1] "C:/Program Files/OpenBUGS"
There is no guarantee for this approach to work on any Windows platforms, but I think this is better than explaining what is the PATH variable to some Windows users…
animation package. I’ve finished writing the initial version of the function sample.ratio() for this package, which will appear in the version 1.1-2 a couple of days later.
As we know, the benefit of ratio estimation is that sampling skewness may be adjusted for, because the estimation of will make use of the information in the relationship of X and Y:
. Here is a demo (we can see the ratio estimate, denoted by the red line, generally performs better than
):
pgfSweave, I begin to notice the font families in my graphs when writing Sweave documents. The default font family for PDF graphs is Helvetica, which is, in most cases (I think), inconsistent with the LaTeX font styles. Some common font families are listed in ?postscript, and we can take a look at them by:
for (f in c("AvantGarde", "Bookman", "Courier", "Helvetica",
"Helvetica-Narrow", "NewCenturySchoolbook", "Palatino", "Times")) {
pdf.options(family = f)
pdf(paste(f, ".pdf", sep = ""))
set.seed(123)
plot(rnorm(25), pch = 1:25, xlab = "xlab family", ylab = "ylab font",
main = paste("Font Families in R (PDF):", f))
text(13, 0, "Text in the Middle")
mtext(sprintf("pdf.options(family = \"%s\")", f), side = 4)
dev.off()
}
Here is a merged PDF containing the above single PDF files:
It seems that "Bookman", "NewCenturySchoolbook", "Palatino" and "Times" can be better choices when using Sweave because they are serif fonts, which are usually more consistent with LaTeX PDF.






Recent Comments