Computer Science

I’m interested in computer technologies, especially in software; I know very little about hardware

Aug 222010

I’m really surprised that most beamer slides I’ve ever seen have Figure/Table captions like this:

Figure: blabla blabla

or

Table: blabla blabla

which should have been Figure 1 or Table 2.

Why are the caption numbers missing? This is because beamer does not produce these numbers by default. To enable numbered captions, you have to put this in the preamble of your LaTeX document:

\setbeamertemplate{caption}[numbered]

I’m a picky LaTeX users… I cannot stand captions without numbers.

Aug 142010
Update: now the long hints for function parameters can be broken into several shorter lines.

Auto-completion is fancy in a text editor. Notepad++ does not support auto-completion for the R language, so I spent a couple of hours on creating such an XML file to support R:

Download: R.xml (938Kb)

Put it under ‘plugins/APIs‘ in the installation directory of Notepad++ (you can see several other XML files there supporting different languages such as C), and make sure you have enabled auto-completion in Notepad++ (Settings --> Preferences --> Backup/Auto-completion). Open an R script and start typing a familiar function (e.g. paste()), you will see some candidates in a drop-down list like this:

Show parameters of R functions in Notepad++

Show parameters of R functions in Notepad++

Hit the Enter key if the function name selected in the list is correct for you, then type ‘(‘ and you will see hints for parameters:

Auto-completion in Notepad++ for R script

Auto-completion in Notepad++ for R script

The file R.xml was actually generated from R; it contains almost all visible R objects in base R packages as well as recommended packages like MASS. You may create an extended XML file (containing keywords from other packages) by yourself after loading the packages you need into your current workspace, and run:

source('http://yihui.name/en/wp-content/uploads/2010/08/Npp_R_Auto_Completion.r')
# R.xml will be generated under your current work directory: getwd()
Mar 282010

When we want to call external programs in R under Windows, we often need to know the paths of these programs. For instance, we may want to know where ImageMagick is installed, as we need the convert (convert.exe) utility to convert images to other formats, or where OpenBUGS is installed because we need this path to use the function bugs(). Usually this problem does not exist under Linux, because the executables (or their symbolic links) are often put in the directories which are in the environment variable PATH (e.g. /usr/bin, /usr/local/bin).

However, we may be able to find the paths through the registry if the installation will save the path info in the registry hive. The R function is readRegistry():

## ImageMagick:
## I used this trick in the function saveMovie (the animation package)
> readRegistry("SOFTWARE\\ImageMagick\\Current")
$BinPath
[1] "C:\\Program Files\\ImageMagick"
$CoderModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\coders"
$ConfigurePath
[1] "C:\\Program Files\\ImageMagick\\config"
$FilterModulesPath
[1] "C:\\Program Files\\ImageMagick\\modules\\filters"
$LibPath
[1] "C:\\Program Files\\ImageMagick"
$QuantumDepth
[1] 16
$Version
[1] "6.3.8"

## OpenBUGS
> r = names(readRegistry("Software\\Microsoft\\Windows\\ShellNoRoam\\MUICache",
+    "HCU"))
> dirname(r[grep("OpenBUGS\\.exe", r)])
[1] "C:/Program Files/OpenBUGS"

There is no guarantee for this approach to work on any Windows platforms, but I think this is better than explaining what is the PATH variable to some Windows users…

Mar 122010

Stephanie asked in 511 today if we were able to get the random seed which was set by set.seed() but we were only given the random numbers (without knowing the seed). This kind of “hacker” questions sound interesting. One dirty solution should be the brute-force method, e.g:

# x: the random vector;
# FUN: the function that generates random numbers with the first argument
#      being the length of random numbers
# seed: candidate seeds to be tried one by one
# ...: other arguments to be passed to FUN
find.seed = function(x, FUN = rnorm, seed = 0:10000, ...) {
    res = NULL
    for (i in seed) {
        set.seed(i)
        rx = FUN(length(x), ...)
        # all() can be changed to all.equal() to obtain a rough solution
        #     allowing a little bit numeric errors
        if (all(x == rx)) {
            res = i
            break
        }
    }
    res
}
Feb 182010

For a long time I’ve been wondering why we are not able to use Enter in the LyX Scrap environment which was set up by Gregor Gorjanc for Sweave. Two weeks ago, I (finally!) could not help asking Gregor about this issue, as I’m using “LyX + Sweave” more and more in my daily work. He explained it here: LyX-Sweave: mandatory use of control+enter in code chunks

After digging into the LyX customization manual for a while, I found a solution which allows us to press the Enter key just as we normally do when typing in a LyX document. The key is to use Environment instead of paragraph as LatexType for the style definition of Scrap. Besides, I used the LatexName as wrapsweave, as a LatexName is required by LyX. The definition for wrapsweave is simple: just two empty lines by \par. (If you define it as \newenvironment{wrapsweave}{}{}, you will run into troubles sometimes; especially when you use indent for paragraphs.)

As we know, LaTeX environment cannot be centered in LyX (only paragraphs can), so I defined a special environment ScrapCenter when I want to insert graphics via Sweave and make them center-aligned.

Sep 262009

As Sir Francis Bacon said, “Histories make men wise; poets witty; the mathematics subtile[1]; natural philosophy deep; moral grave; logic and rhetoric able to contend.” And Windows stupid.

He should have added the last sentence if he were a Windows user in this age.

1. Avoid Using M$ Excel

A lot of R users often ask this question: “How to import MS Excel data into R?” Well, my suggestion is, avoid using M$ Excel if you are a statistician (or going to be a statistician) because you just cannot imagine how messy Excel data can be: some cells might be merged, some are colored, some texts are bold, several data tables can be put everywhere (e.g. cell(1,1) to (10,4), and (17,3) to (25,9)), stupid bar plots and pie charts are inserted in the sheets, silly statistical procedures that are wrong forever… If you don’t trust my words (yes, I’m a nobody), just read the examples here: Problems with Excel (collected by Prof Harrell).

I know there are reasons for you to continue using Excel. Your boss required you to do so; you don’t have time to learn more about various data formats; everybody is using Excel, and you don’t want to be so cool to use R; or if you finish your tasks too quickly and accurately, your boss will doubt whether you have really spent time on working, hence you will get less money paid (this is a REAL story for me – though I didn’t get less payment, I was indeed doubted when I used R); …

Jun 102009
Tag cloud is a bunch of words drawn in a graph with their sizes proportional to their frequency; it’s widely used in blogs to visualize tags. We can observe important words quickly from a tag cloud, as they often appear in large fontsize. Tony N. Brown asked how to “graphically represent frequency of words in a speech” the other day in R-help list, which is actually a problem about the tag cloud:

I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]

Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.

1. Arranging text labels with pointLabel()

Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:

Simulated Tag Cloud with R function pointLabel() in maptools

Simulated Tag Cloud with R function pointLabel() in maptools

Jun 042009

Currently I haven’t found a good plugin to insert Google Adsense code into the bbPress forum (for the plugin adsense-for-bbpress, I don’t like the idea of posting Adsense code as if it were a post), so I opened the template file post.php and manually inserted the code as:

<div class="threadpost">
	<div class="post">
	<?php
	if ($bb_alt['post'] == '0' || (is_int($bb_alt['post']/2)) && rand(0, 20) == 10) {
	?>
		<span style="float:right;padding-left:1em;">
		<script type="text/javascript"><!--
		google_ad_client = "pub-2679974521646557";
		/* 200x200 @ EN */
		google_ad_slot = "0041982581";
		google_ad_width = 200;
		google_ad_height = 200;
		//-->
		</script>
		<script type="text/javascript"
		src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
		</script></span>
	<?php } ?>
	<?php post_text(); ?>
	</div>
	<div class="poststuff"><?php printf( __('Posted %s ago'), bb_get_post_time() ); ?>
	<a href="<?php post_anchor_link(); ?>">#</a> <?php post_ip_link(); ?>
	<?php post_edit_link(); ?> <?php post_delete_link(); ?></div>
</div>

The variable $bb_alt['post'] has recorded the order of a post as 1, 2, …, n. My code above makes sure that the adsense code will appear

  1. in the top post with probability 100%; (guaranteed by $bb_alt['post'] == '0')
  2. randomly in those posts of even-order (is_int($bb_alt['post']/2)) with probability 1/21≈5% (rand(0, 20) == 10), but this probability depends on the RNG in PHP;

To a statistician, the only thing that is not random is “everything is random”. :grin:

May 302009
This plugin is based on the post “Convert MySQL Tables to UTF-8” and an existing plugin by g30rg3_x. The reason I modified their code is that they will convert all tables in your database to the UTF-8 charset, but what we need is to convert WP tables, so I changed the code "SHOW TABLES" to "SHOW TABLES LIKE " . $table_prefix . "%", which will guarantee other tables could stay untouched. Besides, g30rg3_x’s purpose was to alter the charset of old WP databases to new UTF-8 databases, but in fact we also need to change the charset after we moved our DB to a new host when the charset is not UTF-8 by default. Judging from my experience, the default charset/collation for many web hosts is latin1/latin1_swedish_ci (I don’t know why), whereas popular web-buidling systems often use utf8/utf8_general_ci, thus we need to change the charset before all content could be normally displayed. Without PHP and SHOW TALBES / SHOW COLUMNS, we will need to write endless code to change all tables and all columns.

mysql> select collation('asdf'); # default collation
+-------------------+
| collation('asdf') |
+-------------------+
| latin1_swedish_ci |
+-------------------+
1 row in set (0.00 sec)

Download the UTF-8 DB Converter for WordPress

The critical part of this plugin is:

....
$sql2 = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci";
....
$sql4 = "ALTER TABLE $table CHANGE `$field_name` `$field_name` $field_type
         CHARACTER SET utf8 COLLATE utf8_bin";
....

I don’t think I need to describe the installation again, but I sould warn you again about possible data lost during the conversion. Do back up early please.

Nov 052008
Yesterday I took a look at the Google APIs for visualization; the gallery is poor for data visualization, I think. Anyway, an idea came to my mind – to make use of the MotionChart to demonstrate the Brownian Motion. Random numbers can be generated from R and read into JavaScript (the DataTable object in Google API). Here is an example:

<script type="text/javascript" src="http://www.google.com/jsapi"></script>
<script type="text/javascript">
  google.load("visualization", "1", {packages:["motionchart"]});
  google.setOnLoadCallback(drawChart);
  function drawChart() {
	var data = new google.visualization.DataTable();
	data.addRows(750);
	data.addColumn('string', 'point');
	data.addColumn('number', 'year');
	data.addColumn('number', 'X');
	data.addColumn('number', 'Y');
	data.setValue(0, 0, "01");
	data.setValue(0, 1, 1901);
	data.setValue(0, 2, 1.24);
	data.setValue(0, 3, -0.37);
	data.setValue(1, 0, "02");
	data.setValue(1, 1, 1901);
	data.setValue(1, 2, 0.12);
	data.setValue(1, 3, 0.48);
	data.setValue(2, 0, "03");
	data.setValue(2, 1, 1901);
	data.setValue(2, 2, -1.5);
	data.setValue(2, 3, -1.21);
	....
	data.setValue(748, 0, "14");
	data.setValue(748, 1, 1950);
	data.setValue(748, 2, 1.73);
	data.setValue(748, 3, -2.24);
	data.setValue(749, 0, "15");
	data.setValue(749, 1, 1950);
	data.setValue(749, 2, 2.38);
	data.setValue(749, 3, 16.65);
	var chart = new google.visualization.MotionChart(document.getElementById('chart_div'));
	chart.draw(data, {width: 600, height: 500});", "      }
</script>

<div id="chart_div" style="width: 600px; height: 500px;"></div>

[Preview Brownian Motion with GoogleVis API]

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie