Jun 102009
Tag cloud is a bunch of words drawn in a graph with their sizes proportional to their frequency; it’s widely used in blogs to visualize tags. We can observe important words quickly from a tag cloud, as they often appear in large fontsize. Tony N. Brown asked how to “graphically represent frequency of words in a speech” the other day in R-help list, which is actually a problem about the tag cloud:

I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]

Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.

1. Arranging text labels with pointLabel()

Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:

Simulated Tag Cloud with R function pointLabel() in maptools

Simulated Tag Cloud with R function pointLabel() in maptools

May 302009
This plugin is based on the post “Convert MySQL Tables to UTF-8” and an existing plugin by g30rg3_x. The reason I modified their code is that they will convert all tables in your database to the UTF-8 charset, but what we need is to convert WP tables, so I changed the code "SHOW TABLES" to "SHOW TABLES LIKE " . $table_prefix . "%", which will guarantee other tables could stay untouched. Besides, g30rg3_x’s purpose was to alter the charset of old WP databases to new UTF-8 databases, but in fact we also need to change the charset after we moved our DB to a new host when the charset is not UTF-8 by default. Judging from my experience, the default charset/collation for many web hosts is latin1/latin1_swedish_ci (I don’t know why), whereas popular web-buidling systems often use utf8/utf8_general_ci, thus we need to change the charset before all content could be normally displayed. Without PHP and SHOW TALBES / SHOW COLUMNS, we will need to write endless code to change all tables and all columns.

mysql> select collation('asdf'); # default collation
+-------------------+
| collation('asdf') |
+-------------------+
| latin1_swedish_ci |
+-------------------+
1 row in set (0.00 sec)

Download the UTF-8 DB Converter for WordPress

The critical part of this plugin is:

....
$sql2 = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci";
....
$sql4 = "ALTER TABLE $table CHANGE `$field_name` `$field_name` $field_type
         CHARACTER SET utf8 COLLATE utf8_bin";
....

I don’t think I need to describe the installation again, but I sould warn you again about possible data lost during the conversion. Do back up early please.

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie