May 252011
Please ignore this post completely, because Sweave support has become mature in LyX since 2.0.2, and I’m not going to add the pgfSweave module in LyX. For pgfSweave users, you may consider the new knitr module (available since 2.0.3) which uses the R package knitr.

About half a year ago, I wrote a post on the configuration of (pgf)Sweave and LyX, which was intended to save us some efforts in going through all the details during the configuration. Now many things have changed: LyX 2.0 has internal support for Sweave, and fortunately I have been in touch with the developers on this feature (thanks to Gregor); meanwhile, there have also been many changes in the pgfSweave package. In all, we have a number of new features which we should definitely make use of.

New Features

A list of new features as far as I can remember:

  1. support for Sweave in LyX 2.0 is internal, so there is no need to modify the preferences file manually (the converters have been defined internally)
  2. most importantly, Sweave becomes an independent module now in LyX, which means you can use it with arbitrary layouts!
  3. we can see the messages during compilation in LyX 2.0 (View–>View Messages), which is really really helpful and I would strongly recommend you to turn on this option when compiling Sweave documents, because you will know which code chunk goes wrong in case of any errors (in the past, you only got an annoying error dialog box which told you almost nothing about the error)
  4. pgfSweave is faster: it uses the GNU make utility to compile graphics, and you can use multi cores if you like; the compilation becomes 3 steps (pdflatex, make, then pdflatex); other nice features include: the R code is put in an environment Hinput now so you can customize it in LaTeX preamble; there will be no longer a huge gap between the R code and the output (fixed by Liang Qi)…
  5. tikzDevice has better support for multi-byte characters (using UTF8)

I have been working on improving the Sweave module and adding a new pgfSweave module to LyX, and now I have basically finished what I planned to do. See the ticket #7555 for details. To sum up,

  1. LaTeX will not complain about not being able to find Sweave.sty; I used several tricks to guarantee this — even in the worst case, LaTeX can still use the hard-coded Sweave style;
  2. Spaces and dots in path names or filenames will no longer be a problem;
  3. the pgfSweave module is also working now;
  4. you can export the reformatted R code in a LyX document with the pgfSweave module;

I have also tried to document all the cool bells and whistles in two examples, sweave.lyx and pgfsweave.lyx. Or you can directly read the PDF documents, sweave.pdf.tar.xz and pgfsweave.pdf.

Try the (pgf)Sweave Module

It seems several people are interested in testing the two modules, and it is actually very easy under Linux. So far I have had no luck with Windows to build LyX from source (I tried once; it took me days to compile and ended up with errors).

  1. check out the source code of LyX: svn co svn://svn.lyx.org/lyx/lyx-devel/trunk lyx-devel
  2. (cd lyx-devel) apply my patch sweave-patch.diff to the svn source you checked out just now: patch -p0 -i /path/to/sweave-patch.diff
  3. build LyX
    ./autogen.sh
    ./configure
    make
    sudo make install

Done.

Currently the patch is still waiting on LyX Trac. If you run into any problems before the developers begin to look at the patch, please let me know and we will try to make the two modules more stable and useful. But first of all, please remember to keep all your software packages up-to-date: R 2.13.0, pgfSweave 1.2.1 (run update.packages() in R as frequently as you can) and LaTeX package pgf 2.10 (very important for externalization).

Apr 302011

I remember a few weeks ago, there was a challenge in the R-help list to make the prime symbol in R graphics. In LaTeX, we simply write $X'$ or $X^\prime$. R has a rough support for math expressions (see demo(plotmath)) and they are certainly unsatisfactory for LaTeX users. In fact we can write native LaTeX code in R plots via the tikzDevice package! Why bother to use all kinds of tricks to cheat R? :)

Here is an example per request of a reader of my blog:

authentic math formula in R

Oct 302010
Warning: this post is becoming dated! I’m working with LyX 2.0 now and you are encouraged to try the new version in the future. Some preliminary work can be found here.

0. Summary

Take a look at the video in this entry if you don’t understand the title. To put it short,

  1. install LyX and R as well as a working LaTeX toolkit such as MikTeX or TeXLive or MacTeX;
  2. run source('http://gitorious.org/yihui/lyx-sweave/blobs/raw/master/lyx-sweave-config.R') in R under Windows or Ubuntu or Mac; I tried my best to automatically configure LaTeX, R and LyX;
  3. restart LyX as instructed, and you can enjoy pgfSweave in LyX now — either play with my demo (demo 1; demo 2 with bibliography; a beamer demo; an animation demo with PDF output), or DIY: create a new document, change the document class to article (Sweave noweb) from Document –> Settings, switch the environment to Scrap from the top-left drop list, start your Sweave code chunks like
    <<test>>=
    rnorm(10)
    @

    and click the PDF button to compile this document. Done. Take a look at this video if you feel confused.

This works for MikTeX under Windows (Server 2003 / Win7), and TeXLive 2009 under Ubuntu 10.10, MacTeX 2010 under Mac OS; R 2.12.0 or 2.11.1; LyX 1.6.x.

Oct 172010

This blog post is mainly for Stat 579 students on the homework for week 7, since I received too many “gory” loops in the homework submissions and I think it would help a bit to write my thoughts on R loops for beginners. The immortal motto for newbies in programming is:

If you want to make an apple pie from scratch, you must first create the universe.

Carl Sagan

There have been endless wars on which programming language is better than others, but my view point is, that is nothing but the balance between the code performance and the amount of work for programmers. In an extreme sense, almost all languages give you the ability to create the universe, but you do not really have to if you just want to make an apple pie.

R was born after S, a language which was invented “to turn ideas into software, quickly and faithfully” and received the ACM Software System Award in 1998. Before the S language, statisticians often had to write “gory” low-level computing routines to do data analysis and statistical computation, including those “gory” loops, of course. For example, imagine what you have to do to compute the correlation coefficients in C.

R has wrapped a lot of common tasks in lower-level programming languages (mainly C and Fortran) to make it easier to call and faster to compute (R’s (explicit) loops are generally slower than low-level languages), which frees statisticians from paying too much attention to the gory details in computation. However, the consequence is we have got too many tools in our hands, of which we are often unaware. I have no quick solution on this problem — we have to learn more about the capability of R through many ways, e.g. reading the R-help mailing list, asking experts, doing daily work with R, reading the source code of R functions and playing with the examples in help pages, etc.

Sep 112010

Tal Galili requested in the R-help mailing list for a SyntaxHighlighter brush for the R language, so that WordPress users can highlight their R code easily. I promised to contribute a few minutes on this task, and here is the result:

shBrushR.js (1Kb)
/**
 *  Author: Yihui Xie
 *  URL: http://yihui.name/en/2010/09/syntaxhighlighter-brush-for-the-r-language
 *  License: GPL-2 | GPL-3
 */
SyntaxHighlighter.brushes.R = function()
{
    var keywords = 'if else repeat while function for in next break TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_ NA_character_';
    var constants = 'LETTERS letters month.abb month.name pi';
    this.regexList = [
	{ regex: SyntaxHighlighter.regexLib.singleLinePerlComments,	css: 'comments' },
	{ regex: SyntaxHighlighter.regexLib.singleQuotedString,		css: 'string' },
	{ regex: SyntaxHighlighter.regexLib.doubleQuotedString,		css: 'string' },
	{ regex: new RegExp(this.getKeywords(keywords), 'gm'),		css: 'keyword' },
	{ regex: new RegExp(this.getKeywords(constants), 'gm'),		css: 'constants' },
	{ regex: /[\w._]+[ \t]*(?=\()/gm,				css: 'functions' },
    ];
};
SyntaxHighlighter.brushes.R.prototype	= new SyntaxHighlighter.Highlighter();
SyntaxHighlighter.brushes.R.aliases	= ['r', 's', 'splus'];
Aug 142010

Auto-completion is fancy in a text editor. Notepad++ does not support auto-completion for the R language, so I spent a couple of hours on creating such an XML file to support R:

Download: R.xml (938Kb)

Put it under ‘plugins/APIs‘ in the installation directory of Notepad++ (you can see several other XML files there supporting different languages such as C), and make sure you have enabled auto-completion in Notepad++ (Settings --> Preferences --> Backup/Auto-completion). Open an R script and start typing a familiar function (e.g. paste()), you will see some candidates in a drop-down list like this:

Show parameters of R functions in Notepad++

Show parameters of R functions in Notepad++

Hit the Enter key if the function name selected in the list is correct for you, then type ‘(‘ and you will see hints for parameters:

Auto-completion in Notepad++ for R script

Auto-completion in Notepad++ for R script

The file R.xml was actually generated from R; it contains almost all visible R objects in base R packages as well as recommended packages like MASS. You may create an extended XML file (containing keywords from other packages) by yourself after loading the packages you need into your current workspace, and run:

source('http://yihui.name/en/wp-content/uploads/2010/08/Npp_R_Auto_Completion.r')
# R.xml will be generated under your current work directory: getwd()
Apr 152010

I came across this blog post just now: The Next Big Thing, and of course these words caught my attention:

[...] However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.

I don’t really understand how (much more?) difficult will it be to install and maintain R. Usually it takes about one minute to install it from the binary (and SAS? SPSS? buy it, find a technician, install it, maintain according to different licenses – single PC or server or other types, continue to pay only tens of thousand dollars next year, …). For learning, it depends. I don’t think it is too difficult for people who know well about statistics, and for the rest of people, do they really feel safe to do something they do not understand? For the documentation, some people prefer simple ones and some prefer handbooks (of SAS-style).

In all, I cannot see why R is an epic fail for the above reasons…

What? Data visualization?…

The R community must have been tired of comparing SAS with R. Please don’t tell Prof Frank Harrell about this post…

Mar 282010

When we want to call external programs in R under Windows, we often need to know the paths of these programs. For instance, we may want to know where ImageMagick is installed, as we need the convert (convert.exe) utility to convert images to other formats, or where OpenBUGS is installed because we need this path to use the function bugs(). Usually this problem does not exist under Linux, because the executables (or their symbolic links) are often put in the directories which are in the environment variable PATH (e.g. /usr/bin, /usr/local/bin).

However, we may be able to find the paths through the registry if the installation will save the path info in the registry hive. The R function is readRegistry():

## ImageMagick:
## I used this trick in the function saveMovie (the animation package)
> readRegistry("SOFTWARE\\ImageMagick\\Current")
$BinPath
[1] "C:\\Program Files\\ImageMagick"
$CoderModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\coders"
$ConfigurePath
[1] "C:\\Program Files\\ImageMagick\\config"
$FilterModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\filters"
$LibPath
[1] "C:\\Program Files\\ImageMagick"
$QuantumDepth
[1] 16
$Version
[1] "6.3.8"

## OpenBUGS
> r = names(readRegistry("Software\\Microsoft\\Windows\\ShellNoRoam\\MUICache",
+    "HCU"))
> dirname(r[grep("OpenBUGS\\.exe", r)])
[1] "C:/Program Files/OpenBUGS"

There is no guarantee for this approach to work on any Windows platforms, but I think this is better than explaining what is the PATH variable to some Windows users…

Mar 242010

Amber Watkins gave me a suggestion on the animation for the ratio estimation, and I think this is a good topic for my animation package. I’ve finished writing the initial version of the function sample.ratio() for this package, which will appear in the version 1.1-2 a couple of days later.

As we know, the benefit of ratio estimation is that sampling skewness may be adjusted for, because the estimation of \bar{Y} will make use of the information in the relationship of X and Y: \bar{X} \cdot (\bar{y}/\bar{x}). Here is a demo (we can see the ratio estimate, denoted by the red line, generally performs better than \bar{y}):

An animation demo for the ratio estimation

An animation demo for the ratio estimation

Feb 182010

For a long time I’ve been wondering why we are not able to use Enter in the LyX Scrap environment which was set up by Gregor Gorjanc for Sweave. Two weeks ago, I (finally!) could not help asking Gregor about this issue, as I’m using “LyX + Sweave” more and more in my daily work. He explained it here: LyX-Sweave: mandatory use of control+enter in code chunks

After digging into the LyX customization manual for a while, I found a solution which allows us to press the Enter key just as we normally do when typing in a LyX document. The key is to use Environment instead of paragraph as LatexType for the style definition of Scrap. Besides, I used the LatexName as wrapsweave, as a LatexName is required by LyX. The definition for wrapsweave is simple: just two empty lines by \par. (If you define it as \newenvironment{wrapsweave}{}{}, you will run into troubles sometimes; especially when you use indent for paragraphs.)

As we know, LaTeX environment cannot be centered in LyX (only paragraphs can), so I defined a special environment ScrapCenter when I want to insert graphics via Sweave and make them center-aligned.

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2012 by Yihui Xie