Computer Science

I’m interested in computer technologies, especially in software; I know very little about hardware

Mar 122010

Stephanie asked in 511 today if we were able to get the random seed which was set by set.seed() but we were only given the random numbers (without knowing the seed). This kind of “hacker” questions sound interesting. One dirty solution should be the brute-force method, e.g:

# x: the random vector;
# FUN: the function that generates random numbers with the first argument
#      being the length of random numbers
# seed: candidate seeds to be tried one by one
# ...: other arguments to be passed to FUN
find.seed = function(x, FUN = rnorm, seed = 0:10000, ...) {
    res = NULL
    for (i in seed) {
        set.seed(i)
        rx = FUN(length(x), ...)
        # all() can be changed to all.equal() to obtain a rough solution
        #     allowing a little bit numeric errors
        if (all(x == rx)) {
            res = i
            break
        }
    }
    res
}
Feb 182010

For a long time I’ve been wondering why we are not able to use Enter in the LyX Scrap environment which was set up by Gregor Gorjanc for Sweave. Two weeks ago, I (finally!) could not help asking Gregor about this issue, as I’m using “LyX + Sweave” more and more in my daily work. He explained it here: LyX-Sweave: mandatory use of control+enter in code chunks

After digging into the LyX customization manual for a while, I found a solution which allows us to press the Enter key just as we normally do when typing in a LyX document. The key is to use Environment instead of paragraph as LatexType for the style definition of Scrap. Besides, I used the LatexName as wrapsweave, as a LatexName is required by LyX. The definition for wrapsweave is simple: just two empty lines by \par. (If you define it as \newenvironment{wrapsweave}{}{}, you will run into troubles sometimes; especially when you use indent for paragraphs.)

As we know, LaTeX environment cannot be centered in LyX (only paragraphs can), so I defined a special environment ScrapCenter when I want to insert graphics via Sweave and make them center-aligned.

Sep 262009

As Sir Francis Bacon said, “Histories make men wise; poets witty; the mathematics subtile; natural philosophy deep; moral grave; logic and rhetoric able to contend.” And Windows stupid.

He should have added the last sentence if he were a Windows user in this age.

1. Avoid Using M$ Excel

A lot of R users often ask this question: “How to import MS Excel data into R?” Well, my suggestion is, avoid using M$ Excel if you are a statistician (or going to be a statistician) because you just cannot imagine how messy Excel data can be: some cells might be merged, some are colored, some texts are bold, several data tables can be put everywhere (e.g. cell(1,1) to (10,4), and (17,3) to (25,9)), stupid bar plots and pie charts are inserted in the sheets, silly statistical procedures that are wrong forever… If you don’t trust my words (yes, I’m a nobody), just read the examples here: Problems with Excel (collected by Prof Harrell).

I know there are reasons for you to continue using Excel. Your boss required you to do so; you don’t have time to learn more about various data formats; everybody is using Excel, and you don’t want to be so cool to use R; or if you finish your tasks too quickly and accurately, your boss will doubt whether you have really spent time on working, hence you will get less money paid (this is a REAL story for me – though I didn’t get less payment, I was indeed doubted when I used R); …

Jun 102009
Tag cloud is a bunch of words drawn in a graph with their sizes proportional to their frequency; it’s widely used in blogs to visualize tags. We can observe important words quickly from a tag cloud, as they often appear in large fontsize. Tony N. Brown asked how to “graphically represent frequency of words in a speech” the other day in R-help list, which is actually a problem about the tag cloud:

I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]

Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.

1. Arranging text labels with pointLabel()

Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:

Simulated Tag Cloud with R function pointLabel() in maptools

Simulated Tag Cloud with R function pointLabel() in maptools

Jun 042009

Currently I haven’t found a good plugin to insert Google Adsense code into the bbPress forum (for the plugin adsense-for-bbpress, I don’t like the idea of posting Adsense code as if it were a post), so I opened the template file post.php and manually inserted the code as:

<div class="threadpost">
	<div class="post">
	<?php
	if ($bb_alt['post'] == '0' || (is_int($bb_alt['post']/2)) && rand(0, 20) == 10) {
	?>
		<span style="float:right;padding-left:1em;">
		<script type="text/javascript"><!--
		google_ad_client = "pub-2679974521646557";
		/* 200x200 @ EN */
		google_ad_slot = "0041982581";
		google_ad_width = 200;
		google_ad_height = 200;
		//-->
		</script>
		<script type="text/javascript"
		src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
		</script></span>
	<?php } ?>
	<?php post_text(); ?>
	</div>
	<div class="poststuff"><?php printf( __('Posted %s ago'), bb_get_post_time() ); ?>
	<a href="<?php post_anchor_link(); ?>">#</a> <?php post_ip_link(); ?>
	<?php post_edit_link(); ?> <?php post_delete_link(); ?></div>
</div>

The variable $bb_alt['post'] has recorded the order of a post as 1, 2, …, n. My code above makes sure that the adsense code will appear

  1. in the top post with probability 100%; (guaranteed by $bb_alt['post'] == '0')
  2. randomly in those posts of even-order (is_int($bb_alt['post']/2)) with probability 1/21≈5% (rand(0, 20) == 10), but this probability depends on the RNG in PHP;

To a statistician, the only thing that is not random is “everything is random”. :grin:

May 302009
This plugin is based on the post “Convert MySQL Tables to UTF-8” and an existed plugin by g30rg3_x. The reason I modified their code is that they will convert all tables in your database to the UTF-8 charset, but what we need is to convert WP tables, so I changed the code "SHOW TABLES" to "SHOW TABLES LIKE " . $table_prefix . "%", which will guarantee other tables could stay untouched. Besides, g30rg3_x’s purpose was to alter the charset of old WP databases to new UTF-8 databases, but in fact we also need to change the charset after we moved our DB to a new host when the charset is not UTF-8 by default. Judging from my experience, the default charset/collation for many web hosts is latin1/latin1_swedish_ci (I don’t know why), whereas popular web-buidling systems often use utf8/utf8_general_ci, thus we need to change the charset before all content could be normally displayed. Without PHP and SHOW TALBES / SHOW COLUMNS, we will need to write endless code to change all tables and all columns.

mysql> select collation('asdf'); # default collation
+-------------------+
| collation('asdf') |
+-------------------+
| latin1_swedish_ci |
+-------------------+
1 row in set (0.00 sec)

Download the UTF-8 DB Converter for Wordpress

The critical part of this plugin is:

....
$sql2 = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci";
....
$sql4 = "ALTER TABLE $table CHANGE `$field_name` `$field_name` $field_type
         CHARACTER SET utf8 COLLATE utf8_bin";
....

I don’t think I need to describe the installation again, but I sould warn you again about possible data lost during the conversion. Do back up early please.

Nov 052008
Yesterday I took a look at the Google APIs for visualization; the gallery is poor for data visualization, I think. Anyway, an idea came to my mind – to make use of the MotionChart to demonstrate the Brownian Motion. Random numbers can be generated from R and read into JavaScript (the DataTable object in Google API). Here is an example:

<script type="text/javascript" src="http://www.google.com/jsapi"></script>
<script type="text/javascript">
  google.load("visualization", "1", {packages:["motionchart"]});
  google.setOnLoadCallback(drawChart);
  function drawChart() {
	var data = new google.visualization.DataTable();
	data.addRows(750);
	data.addColumn('string', 'point');
	data.addColumn('number', 'year');
	data.addColumn('number', 'X');
	data.addColumn('number', 'Y');
	data.setValue(0, 0, "01");
	data.setValue(0, 1, 1901);
	data.setValue(0, 2, 1.24);
	data.setValue(0, 3, -0.37);
	data.setValue(1, 0, "02");
	data.setValue(1, 1, 1901);
	data.setValue(1, 2, 0.12);
	data.setValue(1, 3, 0.48);
	data.setValue(2, 0, "03");
	data.setValue(2, 1, 1901);
	data.setValue(2, 2, -1.5);
	data.setValue(2, 3, -1.21);
	....
	data.setValue(748, 0, "14");
	data.setValue(748, 1, 1950);
	data.setValue(748, 2, 1.73);
	data.setValue(748, 3, -2.24);
	data.setValue(749, 0, "15");
	data.setValue(749, 1, 1950);
	data.setValue(749, 2, 2.38);
	data.setValue(749, 3, 16.65);
	var chart = new google.visualization.MotionChart(document.getElementById('chart_div'));
	chart.draw(data, {width: 600, height: 500});", "      }
</script>

<div id="chart_div" style="width: 600px; height: 500px;"></div>

[Preview Brownian Motion with GoogleVis API]

Oct 122008

I‘d like to thank Prof Michael Friendly for telling me this: just create a hyperlink to the file with the run protocol and then you can open a file directly from the hyperlink. For example, in LaTeX with hyperref package, you may use \href{run:path/to/some.file}{some link} to open this some.file in your PDF. This is a useful hack that I have been looking for over a long time.

Jun 202008

To convert PDF to SWF, we may use the utility pdf2swf in SWF Tools. However, the feature of specifying the frame rate has not been documented yet. The other day Matthias Kramm told me this option just hided behind -s.

Usage: pdf2swf [-options] file.pdf -o file.swf

-h , --help               Print short help message and exit
-V , --version            Print version info and exit
....
-p , --pages range        Convert only pages in range with range e.g. 1-20 or 1,4,6,9-11 or
....
-s , --set param=value    Set a SWF encoder specific parameter.
                         ?See pdf2swf -s help for more information.
....

To specify the frame rate, we just need to use pdf2swf -s framerate=?.

Jun 132008

The SWF Tools has provided several SWF utilities for the manipulation and creation of Flash files. Today I just wrote a wrapper saveSWF() in the package “animation” to convert image frames to Flash animations. Here is an example for the kNN algorithm:

I’d like to thank Hadley for telling me this tool set. The function saveSWF() is to appear in animation 1.0-1.

Till now, there are four kinds of animations in the animation package: (1) animations inside R windows graphics devices; (2) animations in HTML pages (driven by JavaScript); (3) GIF or AVI animations with the help of “ImageMagic”; (4) Flash animations with the help of “SWF Tools”.

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie