I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]
Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.
1. Arranging text labels with pointLabel()
Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:
library(maptools)
set.seed(123)
x = runif(19)
y = runif(19)
w = c("R", "is", "free", "software", "and", "comes",
"with", "ABSOLUTELY", "NO", "WARRANTY", "You", "are", "welcome",
"to", "redistribute", "it", "under", "certain", "conditions")
par(ann = FALSE, xpd = NA, mar = rep(2, 4))
plot(x, y, type = "n", axes = FALSE)
pointLabel(x, y, w, cex = runif(19, 1, 5))
I was fortunate to get a very neat graph with no labels overlapping, but I don’t think this is a good solution, as it doesn’t take care of the initial locations of the words. My rough idea about deciding the initial locations is to sample on circles with radii proportional to the frequency, i.e. let and
where
. In this case, important words will be placed near the center of the plot.
2. Creating tag cloud in a Flash movie using R
The problem becomes quite easy with a Flash movie tagcloud.swf and a JavaScript program swfobject.js. The mechanism, briefly speaking, is that the tag information is passed to the Flash object by JavaScript, and the Flash object will read the variable tagcloud where the sizes, colors and hyperlinks of tags are stored. Finally the tags are visualized like rotating cloud.
It’s not difficult to pass the tag information to JavaScript in pure text. Below is the function which will create an HTML page by default with a tag cloud Flash movie inside it:
Download the source code: tagCloud.r.gz (1.18Kb)#------------------------------------------------------------------------------#
# generating tag cloud in R using Flash and SWFObject #
# tagData: a data.frame containing columns 'tag', 'link', 'count' and optional #
# columns 'color' and 'hicolor' #
# other parameters are self-explaining if you are familiar with #
# the WP plugin 'wp-cumulus' #
#------------------------------------------------------------------------------#
tagCloud = function(tagData, htmlOutput = "tagCloud.html",
SWFPath, JSPath, divId = "tagCloudId", width = 600, height = 400,
transparent = FALSE, tcolor = "333333", tcolor2 = "009900",
hicolor = "ff0000", distr = "true", tspeed = 100, version = 9,
bgcolor = "ffffff", useXML = FALSE, htmlTitle = "Tag Cloud",
noFlashJS, target = NULL, scriptOnly = FALSE) {
if (missing(SWFPath))
SWFPath = "http://www.roytanck.com/wp-content/plugins/wp-cumulus/tagcloud.swf"
if (missing(JSPath))
JSPath = "http://www.roytanck.com/wp-content/plugins/wp-cumulus/swfobject.js"
if (missing(noFlashJS))
noFlashJS = "This will be shown to users with no Flash or Javascript."
tagXML = sprintf("<tags>%s</tags>", paste(sprintf("<a href='%s' style='%s'%s%s%s>%s</a>",
tagData$link, tagData$count, if (is.null(target))
""
else sprintf(" target='%s'", target), if (is.null(tagData$color))
""
else ifelse(is.na(tagData$color), sprintf(" color='0x%s'",
tagData$color, ""), ""), if (is.null(tagData$hicolor))
""
else ifelse(is.na(tagData$hicolor), sprintf(" hicolor='0x%s'",
tagData$hicolor, ""), ""), tagData$tag), collapse = ""))
if (useXML)
cat(tagXML, file = file.path(dirname(htmlOutput), "tagCloud.xml"))
cat(ifelse(scriptOnly, "",
sprintf("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"
?\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\">
<head>
<title>%s</title>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
</head>
<body>",
htmlTitle)), sprintf("\t<script type=\"text/javascript\" src=\"%s\"></script>",
JSPath), sprintf("\t<div id=\"%s\">%s</div>", divId,
noFlashJS), sprintf("\t<script type=\"text/javascript\">
\t\tvar so = new SWFObject(\"%s\", \"tagcloud\", \"%d\", \"%d\", \"%d\", \"#%s\");
%s\t\tso.addVariable(\"mode\", \"tags\");\n\t\tso.addVariable(\"tcolor\", \"0x%s\");
\t\tso.addVariable(\"tcolor2\", \"0x%s\");\n\t\tso.addVariable(\"hicolor\", \"0x%s\");
\t\tso.addVariable(\"tspeed\", \"%d\");\n\t\tso.addVariable(\"distr\", \"%s\");
%s\n\t\tso.write(\"%s\");\n\t\t</script>\n",
SWFPath, width, height, version, bgcolor, ifelse(transparent,
"\t\tso.addParam(\"wmode\", \"transparent\");\n",
""), tcolor, tcolor2, hicolor, tspeed, distr, ifelse(useXML,
"\t\tso.addVariable(\"xmlpath\", \"tagcloud.xml\");",
sprintf("\t\tso.addVariable(\"tagcloud\", \"%s\");",
tagXML)), divId), ifelse(scriptOnly, "", "</body>\n\n</html>"),
file = ifelse(scriptOnly, stdout(), htmlOutput), sep = "\n")
}
The main argument is tagData which is a data.frame containing at least three columns (tag, link and count) and looks like:
> head(tagData)
tag link count
1 2D Kernel Density http://yihui.name/en/tag/2d-kernel-density/ 1
2 algorithm http://yihui.name/en/tag/algorithm/ 1
3 Animation http://yihui.name/en/tag/animation/ 11
4 AniWiki http://yihui.name/en/tag/aniwiki/ 2
5 Arcing http://yihui.name/en/tag/arcing/ 1
6 arrows() http://yihui.name/en/tag/arrows/ 1
Additional columns color and hicolor will be used if they exist (hexadecimal numbers specifying RGB), e.g.
> head(tagData)
tag link count color hicolor
1 2D Kernel Density http://yihui.name/en/tag/2d-kernel-density/ 1 2163bb f0763d
2 algorithm http://yihui.name/en/tag/algorithm/ 1 9f0f38 d825b1
3 Animation http://yihui.name/en/tag/animation/ 11 800130 5b8d6a
4 AniWiki http://yihui.name/en/tag/aniwiki/ 2 7ce1df 6607b0
5 Arcing http://yihui.name/en/tag/arcing/ 1 df4e4a f5cdf2
6 arrows() http://yihui.name/en/tag/arrows/ 1 31f5fb 19d50d
3. Example
Here is an example on visualizing my blog tags. You may need the following swf and js files first if you wish the loading would be faster (by default your browser needs to download these two files from roytanck.com first).
tagCloud(tagData) # use tagCloud(tagData, SWFPath = "tagcloud.swf", JSPath = "swfobject.js") # if you have downloaded these files to your work directory, i.e. getwd(), # this will save you a few seconds loading the flash
The above code will generate an HTML page like this:
You can adjust the parameters as you wish.
4. Other issues
There is still one more step to answer Tony’s original question, namely splitting the speech into single words and computing the frequency. This can be (roughly) done by strsplit(..., split = " ") and table().
Encoding problems may exist in the above code, but URLencode(tagXML) could be of help.
Only Latin characters are supported, but there’s possibility to modify the Flash source file to support other languages. See Roy Tanck’s post for more information.
Other R resources I know so far:
- The R package
R4Xby Romain Fran?ois: you can generate an HTML page containing the tags with dynamic classes attached to the<span>tags (install the package and read its vignette:install.packages('R4X', repos='http://r-forge.r-project.org'); vignette('r4xslides', package='R4X')) - The R package
snippetsby Simon Urbanek: there is a functioncloud()to create word cloud; words are arranged from top to bottom and left to right

Great post as usual Yihui! Very neat and useful!
Thanks, Paolo. I’m glad it’s useful for you. I’ll add it to my R package
funlater: https://r-forge.r-project.org/projects/fun/Yihui, thanks very much for this!
Question: If you set
useXML = T, do you have to specify any additional parameters or alter the XML file in some way? The cloud works just fine with the plain text, but nothing displays when I try to use the XML option.Sorry, I didn’t realize that the Flash file
'tagcloud.swf'would not work with an XML file which was not in the same directory with'tagcloud.swf'. When you useuseXML = TRUE, you need to make sure'tagcloud.swf'and'tagcloud.xml'are in the same directory.I’ve updated the source code tagCloud.r.gz (with a warning when
'tagcloud.swf'could not find the XML file), or you can install thefunpackage from R-Forge about one day later:install.packages("fun", repos = "http://r-forge.r-project.org").Hi,
Thanks for the great tool. However, the links do not work. Any suggestions? Thanks.
John
Great post! Thanks.
I tried to add a tag cloud on my web page but only found out that the tagcloud.swf and swfobject.js links provided in the post do not work. I finally managed to extract the two files used in your homepage (yihui.name). Perhaps you should update the links in this post? Thanks.
Thanks for your reminding! I’ve changed the links in the R package
fun; you may install it and use the functiontagCloud(see it afterinstall.packages("fun", repos = "http://r-forge.r-project.org")).Thank you for your share with community!
I am looking to change size of the window. Where do I look for it: in javascript or AS?
Thanks everyone.
Did you see a line of JavaScript code like this:
var so = new SWFObject("http://yihui.name/en/wp-content/uploads/2009/06/tagcloud.swf", "tagcloud", "600", "400", "9", "#ffffff");; that’s where you can modify the width (600px) and height (400px) of the Flash animation.Thank you for sharing. Could you tell me how I could change this tag cloud from being round to being rectangular? Probably obvious to you
Thanks!
Sorry, but I don’t think it’s possible for the Flash tag cloud currently.
Yihui,
thanks for sharing your code. I am not sure if it is working for me?
After running it says “HTML file created at tagcloud.html ” although I can
not find any output. can you help me understand what I am missing?
Barry
The HTML file will be generated to your work directory by default, and you can browse it by
browseURL(paste("file://", file.path(getwd(), "tagCloud.html"), sep = "")). Or just get the work directory bygetwd(), go there and look for the file ‘tagCloud.html’.Yihui,
I got your code to work the other day with a simple data frame consisiting of three tags. however, today when I added more tags, I got the “This will be shown to users with no Flash or Javascript” message. I did not change anything else, and do not undestand how the data frame could relate to the error message. any ideas?
What the message actually says is there are certain errors with the JavaScript or Flash. I guess there must be some special HTML characters in your tags (e.g. check if there are chars like “<” or “&”).
If you are using the
funpackage, the argumentencode = TRUEin the functiontagCloud()might help.Yihui,
yes, you were right there was a “&” in one of my tags which caused the problem.
(the Encode=True did not fix it though)..thanks again for taking time out of your
busy day to help a Javascript newbie!
Barry
Thanks for pointing the problem out. I’ll consider it as a potential bug, which can be fixed by applying the below function to the tags first:
htmlspecialchars = function(string) { x = c("&", "\"", "'", "<", ">") subx = c("&", """, "'", "<", ">") for (i in seq_along(x)) { string = gsub(x[i], subx[i], string, fixed = TRUE) } string }I’ll commit the change to the
funpackage soon.Yihui, one last thing, I noticed the links do not work, the cursor changes to the finger as if to start to go but does nothing .where should I look there?
Barry
Sorry, but I have no idea what’s going on…
Yihui, I added “target=_blank” to the HREF part of the HTML to get the links working..but where should I put your htmlspecialchars in the Javascript though to get the special characters?
your favorite pest Barry
I’ve just committed the changes in
tagCloud()in the fun package, with a new functionhtmlspecialchars()added in the package. Now your former problem could be solved. You can view the change made totagCloud()here: https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/fun/R/tagCloud.R?root=fun&rev=27&r1=6&r2=27Well, it seems hard for me to use = instead of “<-"
The post told me a way implemented R to generate a .html. Is there anyway or any library of R could support the communication with javascript in real time. Thanks
I use ‘=’ just because it saves typing time, although R gurus strongly recommend us to use ‘<-’.
For your question, what I merely know is there are R packages which can convert R objects (e.g. a matrix) into JavaScript objects (e.g. JSON); the one you might be interested in is the RJSON package in the Omegahat project (www.omegahat.org). I don’t know how JS can communicate with other languages in real time (it is only a script language); could you please give me an example?
Thanks for your suggestion. I like JSON, for it is very easy to transmit data.
I think JS could be viewed as a program executing on the modern browsers, like JAVA on JRE. So there are many ways like AJAX to transmit data. I have learned that PHP could be the intermedia to accomplish the communication by using the PHP “exec” function. Details are here: http://www.stanford.edu/~mjockers/cgi-bin/drupal/node/25.
BTW, are there some books just concentrate on the R language programming not the statistical aspects? For example, data[data[1,]==”a”,] may return a vector or a array, how to write a clean code that do not use if statement? And why each item in data.frame should has the level property and how to update the property when the data.frame has been changed.
Thanks for your generous
I guess you have to accumulate experience in R little by little, since R is a really huge statistical programming environment.
Very nice. Q: Is there any way for the SWF call to return a string (via the URL) rather than invoking the URL for the work. So that the calling javascript can itself act based on the word clicked. Ie, put the word clicked into a entry prompt on the screen, without having to do a page reload (via the URL redirect)?
I don’t know much about Flash; perhaps this is doable via ActionScript, as we can catch the event of mouse-click on an object. But the problem is how to show a web page inside a Flash object — I have no idea on it.
This is very impressive. Was wondering if you know of any way to actually make the graphics/output from R so that I can click on certain points and either get the underlying data for the point or get a link to the URL.. I have only been able to create static graphs.. i am looking for more drill down or click able capabilities..
Anyone know how this is done? I am sure if can be done since R is so powerful, but i don’t know how and cannot find any more info on it.
Thanks.
Ali
ps.. sorry if this is not the right place for the question.. I just am impressed with the work and think if you can do this, i am sure you will know how to do what i am looking to do or will know where to point me, Thanks.
If you want to fulfill such interactions in Flash through R, I do not know any direct way yet, but I know a couple of possibilities: (1) the
FlashMXMLpackage (in Omegahat) (2) theswfDevicepackage (still under development in R-Forge); the latter package has some capabilities for interaction now AFAIK (see demo: http://swfdevice.r-forge.r-project.org/)Another possibility is to use SVG which also supports some interactions. R under Linux has the
svg()graphics device, but I don’t know if it supports interaction. Add-on packages includeSVGAnnotation(Omegahat) andRSVGTipsDevice(on CRAN). You may need special browsers to view the SVG graphics (e.g. Opera, Firefox).Yet another possibility is to use R packages that supports dynamic/interactive graphics, such as
iplots(needs Java),rggobi(needs GGobi), etc. You can use them just like standalone software packages.If you only want to explore your data by yourself, I recommend the third way; but if you want to show some dynamic graphics to other people without special software installed and have to use a common output format like Flash, you may as well create a lot of static graphs beforehand and arrange them in a certain order to illustrate your ideas.
Very impressive!
Hello frnd …. Thanks for sharing this .. but I hv problem on my site i could’t change my tag cloud bgcolour Its white & I want to convert in green so please visit my site so exactly u can understand my problem ….& please if u hv any solution mail me or drop massage on my website please…please…. my site is http://learn-english.co.in
You may add a parameter “bgcolor” to your Flash, e.g.:
Very cool post Yihui, thank you!
Tal
Hi,
Thank you for sharing with us this wonderful tool. I was really impressed by how well it generated the tag cloud. There might be a minor issue in your code when dealing with special chars (e.g. ‘&’), though. The R code calls function URLencode to encode the whole tag. However, I found that I had to set ‘reserved=T’ for URLencode in order to encode ‘&’. Perhaps that should be included in tagCloud?
Thanks for your suggestion! I’ve added that parameter in tagCloud(). The change will appear in R-Forge in one or two days.
Hi Yihui,
I am having a blog as : http://computerandcelltrics.blogspot.com/, and I am trying to install this in my blog. But i am failing to set in my blog. Please anybody help …………..
I am requesting all people to help me in this…………
Waiting for your answer….
Kanna
If you are using WordPress, just install the plugin “wp-cumulus”
Thanq very much for your reply, but what the thing was i am using Google Blog. you can know by visiting my site. pls sugest me some thing. bco’z i have tried many ways and i am failed. http://computerandcelltrics.blogspot.com/
waiting for your’s reply,
Joseph…
MMMmmuah….. lovely…
Actually i.ve been searching for dis since few months n gt 5naly