R Programming

Pure computer programming with R

Apr 132010

It is not uncommon to see messy R code which is almost not human-readable like this:

 # rotation of the word "Animation"
# in a loop; change the angle and color
# step by step
for (i in 1:360) {
 # redraw the plot again and again
plot(1,ann=FALSE,type="n",axes=FALSE)
# rotate; use rainbow() colors
text(1,1,"Animation",srt=i,col=rainbow(360)[i],cex=7*i/360)
# pause for a while
Sys.sleep(0.01)}

Apparently it is pain reading unformatted R code, but on the other hand, it is natural for us to be lazy. I don’t care about adding spaces or indent to my raw R code — I’ll concentrate on programming first and format my code later. The R package ‘formatR‘ is intended to help us format our messy R code. Two lines of R code will show you the graphical interface of formatR:

# formatR depends on RGtk+, will be installed automatically
# please use the latest version of R (>=2.10.1)
install.packages('formatR')
library(formatR)
# or formatR()

Then you can either paste your code into the text box or click the “Open” button to open an existing R code file. Click the “Convert” button and you are done!

formatR: unformatted R code

formatR: unformatted R code

formatR: tidy R code

formatR: tidy R code

There are several options in the “Preferences” panel, e.g. you can specify whether to keep comments or blank lines, or specify the width of the formatted R code.

No matter how messy your code looks like, formatR can make it tidy and structured as long as there are no syntax errors in your R code. If you prefer the command line interface, you may want to take a look at the function tidy.source() in the animation package.

Currently there are problems with the encoding of multi-byte characters, and I have not figured out how to deal with them.

Apr 032010

We know the real distribution of the F statistic in linear models — it is a non-central F distribution. Under H0, we have a central F distribution. Given 1 – α, we can compute the probability of (correctly) rejecting H0. I created a simple demo to illustrate how the power changes as other parameters vary, e.g. the degrees of freedoms, the non-central parameter and alpha. Here is the video:

The Power of F Test

And for those who might be interested, here is the code (you need to install the gWidgets package first and I recommend the RGtk2 interface). Have fun:

Mar 282010

When we want to call external programs in R under Windows, we often need to know the paths of these programs. For instance, we may want to know where ImageMagick is installed, as we need the convert (convert.exe) utility to convert images to other formats, or where OpenBUGS is installed because we need this path to use the function bugs(). Usually this problem does not exist under Linux, because the executables (or their symbolic links) are often put in the directories which are in the environment variable PATH (e.g. /usr/bin, /usr/local/bin).

However, we may be able to find the paths through the registry if the installation will save the path info in the registry hive. The R function is readRegistry():

## ImageMagick:
## I used this trick in the function saveMovie (the animation package)
> readRegistry("SOFTWARE\\ImageMagick\\Current")
$BinPath
[1] "C:\\Program Files\\ImageMagick"
$CoderModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\coders"
$ConfigurePath
[1] "C:\\Program Files\\ImageMagick\\config"
$FilterModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\filters"
$LibPath
[1] "C:\\Program Files\\ImageMagick"
$QuantumDepth
[1] 16
$Version
[1] "6.3.8"

## OpenBUGS
> r = names(readRegistry("Software\\Microsoft\\Windows\\ShellNoRoam\\MUICache",
+    "HCU"))
> dirname(r[grep("OpenBUGS\\.exe", r)])
[1] "C:/Program Files/OpenBUGS"

There is no guarantee for this approach to work on any Windows platforms, but I think this is better than explaining what is the PATH variable to some Windows users…

Mar 232010

Here are some (trivial) R tips in the course Stat 511. I’ll update this post till the semester is over.

  1. Formatting R Code

  2. I’ve submitted an R package named formatR to CRAN yesterday. This package should be easier than the code below, because there is a GUI to tidy your R code. Install with install.packages('formatR').

    Reading code is pain, but the well-formatted code might alleviate the pain a little bit. The function tidy.source() in the animation package can help us format our R code automatically. By default it will read your code in the clipboard, parse it and return the well-formatted code. You have options to keep or remove the comments/blank lines and set the width of the code, etc. Spaces and indent will be added automatically. This can save us time typing spaces and paying attention to indent.

    ## install.packages('animation') if it is not installed yet
    library(animation)
    ## copy some R code somewhere and type:
    tidy.source()
    ## or specify the path of your code file
    tidy.source(file.path(system.file(package = "graphics"), "demo", "image.R"))
    ## can also use a URL
    tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R')
    ## remove blank lines
    tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
               keep.blank.line = FALSE)
    ## remove comments
    tidy.source('http://www.public.iastate.edu/~dnett/S511/twofactor.R',
               keep.comment = FALSE)
    
Feb 182010

For a long time I’ve been wondering why we are not able to use Enter in the LyX Scrap environment which was set up by Gregor Gorjanc for Sweave. Two weeks ago, I (finally!) could not help asking Gregor about this issue, as I’m using “LyX + Sweave” more and more in my daily work. He explained it here: LyX-Sweave: mandatory use of control+enter in code chunks

After digging into the LyX customization manual for a while, I found a solution which allows us to press the Enter key just as we normally do when typing in a LyX document. The key is to use Environment instead of paragraph as LatexType for the style definition of Scrap. Besides, I used the LatexName as wrapsweave, as a LatexName is required by LyX. The definition for wrapsweave is simple: just two empty lines by \par. (If you define it as \newenvironment{wrapsweave}{}{}, you will run into troubles sometimes; especially when you use indent for paragraphs.)

As we know, LaTeX environment cannot be centered in LyX (only paragraphs can), so I defined a special environment ScrapCenter when I want to insert graphics via Sweave and make them center-aligned.

Nov 112009

Since animation 1.0-9, we will be able to create a PDF document with an animation embedded in it; the function is saveLatex(), and its usage is similar to saveMovie() and saveSWF(): you pass an R expression for creating animations to this function, and this expression will be evaluated in the function; the image frames get recorded by a graphics device. In the end, a LaTeX document is written in a directory, and we can get a PDF document by running pdflatex on the document.

In fact, the key point is the LaTeX package named animate, which can be used to insert image frames into a PDF document to generate an animation. The interface of animations created by this package is quite similar to the HTML animation page by the R package animation, moreover, it also uses JavaScript (in PDF) to animate the image frames.

Sep 262009

As Sir Francis Bacon said, “Histories make men wise; poets witty; the mathematics subtile[1]; natural philosophy deep; moral grave; logic and rhetoric able to contend.” And Windows stupid.

He should have added the last sentence if he were a Windows user in this age.

1. Avoid Using M$ Excel

A lot of R users often ask this question: “How to import MS Excel data into R?” Well, my suggestion is, avoid using M$ Excel if you are a statistician (or going to be a statistician) because you just cannot imagine how messy Excel data can be: some cells might be merged, some are colored, some texts are bold, several data tables can be put everywhere (e.g. cell(1,1) to (10,4), and (17,3) to (25,9)), stupid bar plots and pie charts are inserted in the sheets, silly statistical procedures that are wrong forever… If you don’t trust my words (yes, I’m a nobody), just read the examples here: Problems with Excel (collected by Prof Harrell).

I know there are reasons for you to continue using Excel. Your boss required you to do so; you don’t have time to learn more about various data formats; everybody is using Excel, and you don’t want to be so cool to use R; or if you finish your tasks too quickly and accurately, your boss will doubt whether you have really spent time on working, hence you will get less money paid (this is a REAL story for me – though I didn’t get less payment, I was indeed doubted when I used R); …

Aug 312009

Yanping Chen raised a question in the Chinese COS forum on the output of Eviews: how to (re)format the decimal coefficients in equations as text output? For example, we want to round the numbers in CC = 16.5547557654 + 0.0173022117998*PP + 0.216234040485 * PP(-1) + 0.810182697599 * (WP + WG) to the 3rd decimal places. This can be simply done by regular expressions, as decimals always begin with a “.”. The basic steps are:

  1. find out where are the decimals in the character string;
  2. format them;
  3. replace the original decimals with formatted values;

Given a character vector, we can format the decimals with the code below:

Jun 122009
Linlin Yan posted a cool (hot?) simulation of burning fire with R in the COS forum yesterday, which was indeed a warm welcome. I’m not sure whether our forum members will be scared by the “fire” under the title “Welcome to COS Forum”. :grin: The fire was mainly created by the function image() with carefully designed rows and columns in heated colors heat.colors(). Here is one of the pictures generated from his code:

Simulation of Burning Fire in R

Simulation of Burning Fire in R

Jun 102009
Tag cloud is a bunch of words drawn in a graph with their sizes proportional to their frequency; it’s widely used in blogs to visualize tags. We can observe important words quickly from a tag cloud, as they often appear in large fontsize. Tony N. Brown asked how to “graphically represent frequency of words in a speech” the other day in R-help list, which is actually a problem about the tag cloud:

I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]

Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.

1. Arranging text labels with pointLabel()

Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:

Simulated Tag Cloud with R function pointLabel() in maptools

Simulated Tag Cloud with R function pointLabel() in maptools

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie