pgfSweave, I begin to notice the font families in my graphs when writing Sweave documents. The default font family for PDF graphs is Helvetica, which is, in most cases (I think), inconsistent with the LaTeX font styles. Some common font families are listed in ?postscript, and we can take a look at them by:
for (f in c("AvantGarde", "Bookman", "Courier", "Helvetica",
"Helvetica-Narrow", "NewCenturySchoolbook", "Palatino", "Times")) {
pdf.options(family = f)
pdf(paste(f, ".pdf", sep = ""))
set.seed(123)
plot(rnorm(25), pch = 1:25, xlab = "xlab family", ylab = "ylab font",
main = paste("Font Families in R (PDF):", f))
text(13, 0, "Text in the Middle")
mtext(sprintf("pdf.options(family = \"%s\")", f), side = 4)
dev.off()
}
Here is a merged PDF containing the above single PDF files:
It seems that "Bookman", "NewCenturySchoolbook", "Palatino" and "Times" can be better choices when using Sweave because they are serif fonts, which are usually more consistent with LaTeX PDF.
Today Romain Francois posted an interesting topic in the R-help list, and you can read his blog post for more details: celebrating R commit #50000. 50000 is certainly not a small number; we do owe R core members a big “thank you” for their great efforts in this fantastic statistical language in the 13 years. When I saw Romain’s data, I suddenly remembered a question I asked to one of Prof Ripley’s student a couple of years ago: does Prof Ripley ever sleep? And he answered “No!”. No wonder we can see Prof Ripley so frequently in the R-help/devel mailing list. If you have stayed on R-help list for enough long time, you’ll surely know several facts, e.g. Martin Maechler will arrive in less than 3 minutes if you dare call an R package “library”, and you will get “Ripleyed” if you are not careful enough in posting your R code.
> library(fortunes)
> fortune("Ripleyed")
And the fear of getting Ripleyed on the mailing list also makes me think, read,
and improve before submitting half baked questions to the list.
-- Eric Kort
R-help (January 2006)
Today Ruya Gokhan Kocer asked me how to use the R function identify() in off-screen graphics devices. Actually it’s pretty easy as long as we obtain the list returned by identify(pos = TRUE). For example,
# open a windows device
x11()
x = rnorm(20)
y = rnorm(20)
plot(x, y)
# identify 5 points
id = identify(x, y, n = 5, pos = TRUE)
# $ind
# [1] 2 6 10 14 16
#
# $pos
# [1] 1 1 4 4 1
# then open a bitmap device
png("identify.png")
plot(x, y)
# use the information from above mouse click
text(x[id$ind], y[id$ind], id$ind, pos = id$pos)
dev.off()
The other day I sent a small assignment to a group of people in order that they could “play” with statistics and become more interested with this subject. The data provided to them is:
Downdload the file here (104K)The data-generating process was quite simple: first I generated 20000 random numbers (10000 rows, 2 columns) from N(0, 1) and then add 10000 rows of numbers which lie exactly on a circle; at last I provided this data in a randomized order so people cannot easily discover the pattern just from the numbers.
The question is, how to reveal the particular pattern in this “pile of sand”? Let’s look at the original plot:

The original scatter plot
What can we observe from this scatter plot? Perhaps nothing but “a pile of sand”. However, if we choose alternative ways to create the plot again, things will be completely different. Here are my approaches:
In some operating systems, a few R graphical devices might not be available, so we have to check the capabilities of devices before writing code for creating image files in case that there should be errors. The function is just capabilities().
I didn’t notice this and was wondering why there were errors in the check summary of my R package “animation“. Now I understand the reason. Thus I’ll modify the function savePNG() a little.
vi.lilac.chaser().
This was a sudden idea that came into my mind yesterday. Actually some optical illusions can be very easily created using R graphics system. Here is one example I wrote yesterday:
Downdload the file here# By Yihui XIE, Dec 22, 2007 www.yihui.name
op = par(bg = "gray", mar = rep(2, 4), xpd = NA)
x = seq(0, 2 * pi, length = 16)
invisible(replicate(100, {
for (i in 1:length(x)) {
plot(1, xlim = c(-1, 1), ylim = c(-1, 1), axes = F, ann = F,
type = "n")
points(sin(x[-i]), cos(x[-i]), col = "magenta", cex = 7,
pch = 19)
points(0, 0, pch = "+", cex = 5, lwd = 2)
Sys.sleep(0.05)
}
}))
par(op)
Focus your eyes on the center “+” for a few seconds, and you will find the color of the “circling” point just changes (to green). Perhaps I’ll write a package for these illusions next year.
This afternoon I went to the Beijing Custom to give a lecture on sampling techniques as well as my R program. Actually I didn’t make any preparations until late in this morning. When I finished my lunch, I made some animated pictures to illustrate these four kinds of sampling methods: simple random sampling, stratified sampling, cluster sampling and systematic sampling.
After I came back to school, I added these animations to my little project “Animated Statistics Using R“. You may see them here.
I’m going to give a talk in the CUEB on some topics in the discipline of statistics at the invitation of the Association of Statistics of CUEB, and I’ve mainly prepared two topics for them: one is about those jokes from Prof. Gary’s gallery of statistics jokes, and the other is about some tools for the research of statistics. Below are materials for this talk:
Slides for “Jokes in Statistics” (English, PDF by LaTeX):
Downdload the file hereSlides for “A Leisure Look on Some Tools for Statistics” (Chinese, PDF by PowerPoint):
Downdload the file hereR Codes for my talk (most of them contain somewhat interesting animations):
Downdload the file hereIf there are any errors in these materials, please tell me (through email x@y with x = xieyihui & y = gmail.com or leave a message here directly) . Thanks!
P.S. The time of this talk has been decided now: Nov 1st, 2007. For details please refer to: http://cos.name/bbs/read.php?tid=8122
There are many graphical functions offering the availability of the parameter alpha which is usually used to specify semi-transparent colors, however, such kind of colors can only be displayed in certain devices, as stated in the help of rgb():
Semi-transparent colors (
0 < alpha < 1) are supported only on some devices: at the time of writing only on thequartzdevices as well as several third-party devices such as those in packages Cairo, cairoDevice, JavaGD and RSvgDevice.
Here is an example illustrating semi-transparent colors in a pdf device:
This demo was written by me about three months ago when I was illustrating the algorithm of “Gradient Descent” in the class of “Data Mining & Machine Learning”. I like to combine iterations (or loopings) with animated pictures, because it’s simple and heuristic, and of course, it’s easy in R: just use Sys.sleep() to control the time of steps of your demonstration and some low-level graphics functions such as lines(), points(), rect(), polygon() and segments(), etc to illustrate the process of your algorithm. To understand the figure below, you need to be clear about what’s contour plot.
The code for the above example is as follows:



Recent Comments