R Programming
Pure computer programming with R
For a long time I’ve been wondering why we are not able to use Enter in the LyX Scrap environment which was set up by Gregor Gorjanc for Sweave. Two weeks ago, I (finally!) could not help asking Gregor about this issue, as I’m using “LyX + Sweave” more and more in my daily work. He explained it here: LyX-Sweave: mandatory use of control+enter in code chunks
After digging into the LyX customization manual for a while, I found a solution which allows us to press the Enter key just as we normally do when typing in a LyX document. The key is to use Environment instead of paragraph as LatexType for the style definition of Scrap. Besides, I used the LatexName as wrapsweave, as a LatexName is required by LyX. The definition for wrapsweave is simple: just two empty lines by \par. (If you define it as \newenvironment{wrapsweave}{}{}, you will run into troubles sometimes; especially when you use indent for paragraphs.)
As we know, LaTeX environment cannot be centered in LyX (only paragraphs can), so I defined a special environment ScrapCenter when I want to insert graphics via Sweave and make them center-aligned.
Since animation 1.0-9, we will be able to create a PDF document with an animation embedded in it; the function is saveLatex(), and its usage is similar to saveMovie() and saveSWF(): you pass an R expression for creating animations to this function, and this expression will be evaluated in the function; the image frames get recorded by a graphics device. In the end, a LaTeX document is written in a directory, and we can get a PDF document by running pdflatex on the document.
In fact, the key point is the LaTeX package named animate, which can be used to insert image frames into a PDF document to generate an animation. The interface of animations created by this package is quite similar to the HTML animation page by the R package animation, moreover, it also uses JavaScript (in PDF) to animate the image frames.
As Sir Francis Bacon said, “Histories make men wise; poets witty; the mathematics subtile; natural philosophy deep; moral grave; logic and rhetoric able to contend.” And Windows stupid.
He should have added the last sentence if he were a Windows user in this age.
1. Avoid Using M$ Excel
A lot of R users often ask this question: “How to import MS Excel data into R?” Well, my suggestion is, avoid using M$ Excel if you are a statistician (or going to be a statistician) because you just cannot imagine how messy Excel data can be: some cells might be merged, some are colored, some texts are bold, several data tables can be put everywhere (e.g. cell(1,1) to (10,4), and (17,3) to (25,9)), stupid bar plots and pie charts are inserted in the sheets, silly statistical procedures that are wrong forever… If you don’t trust my words (yes, I’m a nobody), just read the examples here: Problems with Excel (collected by Prof Harrell).
I know there are reasons for you to continue using Excel. Your boss required you to do so; you don’t have time to learn more about various data formats; everybody is using Excel, and you don’t want to be so cool to use R; or if you finish your tasks too quickly and accurately, your boss will doubt whether you have really spent time on working, hence you will get less money paid (this is a REAL story for me – though I didn’t get less payment, I was indeed doubted when I used R); …
Yanping Chen raised a question in the Chinese COS forum on the output of Eviews: how to (re)format the decimal coefficients in equations as text output? For example, we want to round the numbers in CC = 16.5547557654 + 0.0173022117998*PP + 0.216234040485 * PP(-1) + 0.810182697599 * (WP + WG) to the 3rd decimal places. This can be simply done by regular expressions, as decimals always begin with a “.”. The basic steps are:
- find out where are the decimals in the character string;
- format them;
- replace the original decimals with formatted values;
Given a character vector, we can format the decimals with the code below:
The fire was mainly created by the function image() with carefully designed rows and columns in heated colors heat.colors(). Here is one of the pictures generated from his code:
I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]
Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.
1. Arranging text labels with pointLabel()
Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:
After a few hours’ work, I modified the function tidy.source() in the animation package so that it can preserve complete comment lines. See the tidy.source() wiki page for example.
tidy.source <- function(source = "clipboard", keep.comment = TRUE,
keep.blank.line = FALSE, begin.comment, end.comment, ...) {
# parse and deparse the code
tidy.block = function(block.text) {
exprs = parse(text = block.text)
n = length(exprs)
res = character(n)
for (i in 1:n) {
dep = paste(deparse(exprs[i]), collapse = "\n")
res[i] = substring(dep, 12, nchar(dep) - 1)
}
return(res)
}
text.lines = readLines(source, warn = FALSE)
if (keep.comment) {
# identifier for comments
identifier = function() paste(sample(LETTERS), collapse = "")
if (missing(begin.comment))
begin.comment = identifier()
if (missing(end.comment))
end.comment = identifier()
# remove leading and trailing white spaces
text.lines = gsub("^[[:space:]]+|[[:space:]]+$", "",
text.lines)
# make sure the identifiers are not in the code
# or the original code might be modified
while (length(grep(sprintf("%s|%s", begin.comment, end.comment),
text.lines))) {
begin.comment = identifier()
end.comment = identifier()
}
head.comment = substring(text.lines, 1, 1) == "#"
# add identifiers to comment lines to cheat R parser
if (any(head.comment)) {
text.lines[head.comment] = gsub("\"", "\'", text.lines[head.comment])
text.lines[head.comment] = sprintf("%s=\"%s%s\"",
begin.comment, text.lines[head.comment], end.comment)
}
# keep blank lines?
blank.line = text.lines == ""
if (any(blank.line) & keep.blank.line)
text.lines[blank.line] = sprintf("%s=\"%s\"", begin.comment,
end.comment)
text.tidy = tidy.block(text.lines)
# remove the identifiers
text.tidy = gsub(sprintf("%s = \"|%s\"", begin.comment,
end.comment), "", text.tidy)
}
else {
text.tidy = tidy.block(text.lines)
}
cat(paste(text.tidy, collapse = "\n"), "\n", ...)
invisible(text.tidy)
}
Note that inline comments will still be removed. I don’t want to spend more time on dealing with inline comments any more.
You are free to break the lines when writing R code, but be careful to R functions when there are operators “+” or “-” in your expression to return:
f1 = function() {
1 + 1
}
f1() # of course 2
f2 = function() {
1
+ 1
}
f2() # returns 1
f3 = function() {
return(1
+ 1)
}
f3() # 2; use return() if you want break lines, or
f4 = function() {
1 +
1
}
f4() # 2; don't put '+' in the beginning, as '+1==1'
These examples are quite simple, but sometimes you will forget this rule, e.g.:
testMat = function(len = 50, digits = c(3, 5, 7, 9,
11, 13, 15, 17, 19)) {
n = length(digits)
matrix(1:0, 2 * len, n)
+ matrix(10^digits, 2 * len, n, byrow = TRUE)
}
Can you find out the error in the above function without reading my hints beforehand?
Yesterday I have submitted a package called “animation” to CRAN. You may get the source code from CRAN (http://cran.r-project.org/src/contrib/Descriptions/animation.html), or download the Windows binary here.
This package aims at illustrating various statistical methods and data analyses in a form of animation. For further information, please read the vignette at:
http://cran.r-project.org/doc/vignettes/animation/animation.pdf
Indeed Highlight is excellent for highlighting program codes, however, the definition file for R language is far from complete. Below is the original definition file:
$KW_LIST(kwa)=if else repeat while function for in next break
$KW_LIST(kwb)=NULL NA Inf NaN
$STRINGDELIMITERS=" '
$SL_COMMENT=#
$ESCCHAR=\
$SYMBOLS= ( ) [ ] { } , ; : & | < > ! = / * % + -
Certainly, what we still need are those thousands of keywords in R, most of which are functions. Thus how can we get the names of functions in the packages of R? The solution is actually quite easy — just use ls() with the environment name specified. The following is my source code for picking out the names of functions in all packages which are in the search path:


Recent Comments