Jun 102009
Tag cloud is a bunch of words drawn in a graph with their sizes proportional to their frequency; it’s widely used in blogs to visualize tags. We can observe important words quickly from a tag cloud, as they often appear in large fontsize. Tony N. Brown asked how to “graphically represent frequency of words in a speech” the other day in R-help list, which is actually a problem about the tag cloud:

I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. [...]

Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.

1. Arranging text labels with pointLabel()

Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:

Simulated Tag Cloud with R function pointLabel() in maptools

Simulated Tag Cloud with R function pointLabel() in maptools

library(maptools)
set.seed(123)
x = runif(19)
y = runif(19)
w = c("R", "is", "free", "software", "and", "comes",
    "with", "ABSOLUTELY", "NO", "WARRANTY", "You", "are", "welcome",
    "to", "redistribute", "it", "under", "certain", "conditions")
par(ann = FALSE, xpd = NA, mar = rep(2, 4))
plot(x, y, type = "n", axes = FALSE)
pointLabel(x, y, w, cex = runif(19, 1, 5))

I was fortunate to get a very neat graph with no labels overlapping, but I don’t think this is a good solution, as it doesn’t take care of the initial locations of the words. My rough idea about deciding the initial locations is to sample on circles with radii proportional to the frequency, i.e. let x=\textrm{freq}*\sin(\theta) and y=\textrm{freq}*\cos(\theta) where \theta\sim U(0,2\pi). In this case, important words will be placed near the center of the plot.

2. Creating tag cloud in a Flash movie using R

The problem becomes quite easy with a Flash movie tagcloud.swf and a JavaScript program swfobject.js. The mechanism, briefly speaking, is that the tag information is passed to the Flash object by JavaScript, and the Flash object will read the variable tagcloud where the sizes, colors and hyperlinks of tags are stored. Finally the tags are visualized like rotating cloud.

It’s not difficult to pass the tag information to JavaScript in pure text. Below is the function which will create an HTML page by default with a tag cloud Flash movie inside it:

Download the source code: tagCloud.r.gz (1.18Kb)
#------------------------------------------------------------------------------#
# generating tag cloud in R using Flash and SWFObject                          #
# tagData: a data.frame containing columns 'tag', 'link', 'count' and optional #
#     columns 'color' and 'hicolor'                                            #
# other parameters are self-explaining if you are familiar with                #
#     the WP plugin 'wp-cumulus'                                               #
#------------------------------------------------------------------------------#
tagCloud = function(tagData, htmlOutput = "tagCloud.html",
    SWFPath, JSPath, divId = "tagCloudId", width = 600, height = 400,
    transparent = FALSE, tcolor = "333333", tcolor2 = "009900",
    hicolor = "ff0000", distr = "true", tspeed = 100, version = 9,
    bgcolor = "ffffff", useXML = FALSE, htmlTitle = "Tag Cloud",
    noFlashJS, target = NULL, scriptOnly = FALSE) {
    if (missing(SWFPath))
        SWFPath = "http://www.roytanck.com/wp-content/plugins/wp-cumulus/tagcloud.swf"
    if (missing(JSPath))
        JSPath = "http://www.roytanck.com/wp-content/plugins/wp-cumulus/swfobject.js"
    if (missing(noFlashJS))
        noFlashJS = "This will be shown to users with no Flash or Javascript."
    tagXML = sprintf("<tags>%s</tags>", paste(sprintf("<a href='%s' style='%s'%s%s%s>%s</a>",
        tagData$link, tagData$count, if (is.null(target))
            ""
        else sprintf(" target='%s'", target), if (is.null(tagData$color))
            ""
        else ifelse(is.na(tagData$color), sprintf(" color='0x%s'",
            tagData$color, ""), ""), if (is.null(tagData$hicolor))
            ""
        else ifelse(is.na(tagData$hicolor), sprintf(" hicolor='0x%s'",
            tagData$hicolor, ""), ""), tagData$tag), collapse = ""))
    if (useXML)
        cat(tagXML, file = file.path(dirname(htmlOutput), "tagCloud.xml"))
    cat(ifelse(scriptOnly, "",
    sprintf("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"
    ?\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
    <html xmlns=\"http://www.w3.org/1999/xhtml\">
    <head>
    <title>%s</title>
    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
    </head>
    <body>",
        htmlTitle)), sprintf("\t<script type=\"text/javascript\" src=\"%s\"></script>",
        JSPath), sprintf("\t<div id=\"%s\">%s</div>", divId,
        noFlashJS), sprintf("\t<script type=\"text/javascript\">
        \t\tvar so = new SWFObject(\"%s\", \"tagcloud\", \"%d\", \"%d\", \"%d\", \"#%s\");
        %s\t\tso.addVariable(\"mode\", \"tags\");\n\t\tso.addVariable(\"tcolor\", \"0x%s\");
        \t\tso.addVariable(\"tcolor2\", \"0x%s\");\n\t\tso.addVariable(\"hicolor\", \"0x%s\");
        \t\tso.addVariable(\"tspeed\", \"%d\");\n\t\tso.addVariable(\"distr\", \"%s\");
        %s\n\t\tso.write(\"%s\");\n\t\t</script>\n",
        SWFPath, width, height, version, bgcolor, ifelse(transparent,
            "\t\tso.addParam(\"wmode\", \"transparent\");\n",
            ""), tcolor, tcolor2, hicolor, tspeed, distr, ifelse(useXML,
            "\t\tso.addVariable(\"xmlpath\", \"tagcloud.xml\");",
            sprintf("\t\tso.addVariable(\"tagcloud\", \"%s\");",
                tagXML)), divId), ifelse(scriptOnly, "", "</body>\n\n</html>"),
        file = ifelse(scriptOnly, stdout(), htmlOutput), sep = "\n")
}

The main argument is tagData which is a data.frame containing at least three columns (tag, link and count) and looks like:

> head(tagData)
                tag                                        link count
1 2D Kernel Density http://yihui.name/en/tag/2d-kernel-density/     1
2         algorithm         http://yihui.name/en/tag/algorithm/     1
3         Animation         http://yihui.name/en/tag/animation/    11
4           AniWiki           http://yihui.name/en/tag/aniwiki/     2
5            Arcing            http://yihui.name/en/tag/arcing/     1
6          arrows()            http://yihui.name/en/tag/arrows/     1

Additional columns color and hicolor will be used if they exist (hexadecimal numbers specifying RGB), e.g.

> head(tagData)
                tag                                        link count  color hicolor
1 2D Kernel Density http://yihui.name/en/tag/2d-kernel-density/     1 2163bb  f0763d
2         algorithm         http://yihui.name/en/tag/algorithm/     1 9f0f38  d825b1
3         Animation         http://yihui.name/en/tag/animation/    11 800130  5b8d6a
4           AniWiki           http://yihui.name/en/tag/aniwiki/     2 7ce1df  6607b0
5            Arcing            http://yihui.name/en/tag/arcing/     1 df4e4a  f5cdf2
6          arrows()            http://yihui.name/en/tag/arrows/     1 31f5fb  19d50d

3. Example

Here is an example on visualizing my blog tags. You may need the following swf and js files first if you wish the loading would be faster (by default your browser needs to download these two files from roytanck.com first).

Download the tag cloud Flash file tagcloud.swf (33.7Kb) and JavaScript swfoject.js (5.94Kb) as well as the data tagData.gz (1.43Kb).
tagCloud(tagData)
# use tagCloud(tagData, SWFPath = "tagcloud.swf", JSPath = "swfobject.js")
#    if you have downloaded these files to your work directory, i.e. getwd(),
#    this will save you a few seconds loading the flash

The above code will generate an HTML page like this:

Your browser does not support Flash or Javascript!


You can adjust the parameters as you wish.

4. Other issues

There is still one more step to answer Tony’s original question, namely splitting the speech into single words and computing the frequency. This can be (roughly) done by strsplit(..., split = " ") and table().

Encoding problems may exist in the above code, but URLencode(tagXML) could be of help.

Only Latin characters are supported, but there’s possibility to modify the Flash source file to support other languages. See Roy Tanck’s post for more information.

Other R resources I know so far:

  • The R package R4X by Romain Fran?ois: you can generate an HTML page containing the tags with dynamic classes attached to the <span> tags (install the package and read its vignette: install.packages('R4X', repos='http://r-forge.r-project.org'); vignette('r4xslides', package='R4X'))
  • The R package snippets by Simon Urbanek: there is a function cloud() to create word cloud; words are arranged from top to bottom and left to right

Related Posts

36 Responses to “Creating Tag Cloud Using R and Flash / JavaScript (SWFObject)”

Comments (34) Pingbacks (2)
  1. Paolo says:

    Great post as usual Yihui! Very neat and useful!

  2. john says:

    Yihui, thanks very much for this!

    Question: If you set useXML = T, do you have to specify any additional parameters or alter the XML file in some way? The cloud works just fine with the plain text, but nothing displays when I try to use the XML option.

    • Yihui Xie says:

      Sorry, I didn’t realize that the Flash file 'tagcloud.swf' would not work with an XML file which was not in the same directory with 'tagcloud.swf'. When you use useXML = TRUE, you need to make sure 'tagcloud.swf' and 'tagcloud.xml' are in the same directory.

      I’ve updated the source code tagCloud.r.gz (with a warning when 'tagcloud.swf' could not find the XML file), or you can install the fun package from R-Forge about one day later: install.packages("fun", repos = "http://r-forge.r-project.org").

  3. John Lee says:

    Hi,

    Thanks for the great tool. However, the links do not work. Any suggestions? Thanks.

    John

  4. Desktable says:

    Great post! Thanks.

    I tried to add a tag cloud on my web page but only found out that the tagcloud.swf and swfobject.js links provided in the post do not work. I finally managed to extract the two files used in your homepage (yihui.name). Perhaps you should update the links in this post? Thanks.

    • Yihui Xie says:

      Thanks for your reminding! I’ve changed the links in the R package fun; you may install it and use the function tagCloud (see it after install.packages("fun", repos = "http://r-forge.r-project.org")).

  5. Kolyan says:

    Thank you for your share with community!
    I am looking to change size of the window. Where do I look for it: in javascript or AS?
    Thanks everyone.

    • Yihui Xie says:

      Did you see a line of JavaScript code like this: var so = new SWFObject("http://yihui.name/en/wp-content/uploads/2009/06/tagcloud.swf", "tagcloud", "600", "400", "9", "#ffffff");; that’s where you can modify the width (600px) and height (400px) of the Flash animation.

  6. Carlos says:

    Thank you for sharing. Could you tell me how I could change this tag cloud from being round to being rectangular? Probably obvious to you :smile: Thanks!

  7. Barry Brolley says:

    Yihui,
    thanks for sharing your code. I am not sure if it is working for me?
    After running it says “HTML file created at tagcloud.html ” although I can
    not find any output. can you help me understand what I am missing?
    Barry

    • Yihui Xie says:

      The HTML file will be generated to your work directory by default, and you can browse it by browseURL(paste("file://", file.path(getwd(), "tagCloud.html"), sep = "")). Or just get the work directory by getwd(), go there and look for the file ‘tagCloud.html’.

  8. Barry Brolley says:

    Yihui,
    I got your code to work the other day with a simple data frame consisiting of three tags. however, today when I added more tags, I got the “This will be shown to users with no Flash or Javascript” message. I did not change anything else, and do not undestand how the data frame could relate to the error message. any ideas?

    • Yihui Xie says:

      What the message actually says is there are certain errors with the JavaScript or Flash. I guess there must be some special HTML characters in your tags (e.g. check if there are chars like “<” or “&”).

      If you are using the fun package, the argument encode = TRUE in the function tagCloud() might help.

  9. Barry Brolley says:

    Yihui,
    yes, you were right there was a “&” in one of my tags which caused the problem.
    (the Encode=True did not fix it though)..thanks again for taking time out of your
    busy day to help a Javascript newbie!
    Barry

    • Yihui Xie says:

      Thanks for pointing the problem out. I’ll consider it as a potential bug, which can be fixed by applying the below function to the tags first:

      htmlspecialchars = function(string) {
          x = c("&", "\"", "'", "<", ">")
          subx = c("&amp;", "&quot;", "&#039;", "&lt;", "&gt;")
          for (i in seq_along(x)) {
              string = gsub(x[i], subx[i], string, fixed = TRUE)
          }
          string
      }

      I’ll commit the change to the fun package soon.

  10. Yihui, one last thing, I noticed the links do not work, the cursor changes to the finger as if to start to go but does nothing .where should I look there?
    Barry

  11. Yihui, I added “target=_blank” to the HREF part of the HTML to get the links working..but where should I put your htmlspecialchars in the Javascript though to get the special characters?
    your favorite pest Barry

  12. Fenglei says:

    Well, it seems hard for me to use = instead of “<-" :grin:
    The post told me a way implemented R to generate a .html. Is there anyway or any library of R could support the communication with javascript in real time. Thanks

    • Yihui Xie says:

      I use ‘=’ just because it saves typing time, although R gurus strongly recommend us to use ‘<-’.

      For your question, what I merely know is there are R packages which can convert R objects (e.g. a matrix) into JavaScript objects (e.g. JSON); the one you might be interested in is the RJSON package in the Omegahat project (www.omegahat.org). I don’t know how JS can communicate with other languages in real time (it is only a script language); could you please give me an example?

      • Fenglei says:

        Thanks for your suggestion. I like JSON, for it is very easy to transmit data.

        I think JS could be viewed as a program executing on the modern browsers, like JAVA on JRE. So there are many ways like AJAX to transmit data. I have learned that PHP could be the intermedia to accomplish the communication by using the PHP “exec” function. Details are here: http://www.stanford.edu/~mjockers/cgi-bin/drupal/node/25.

        BTW, are there some books just concentrate on the R language programming not the statistical aspects? For example, data[data[1,]==”a”,] may return a vector or a array, how to write a clean code that do not use if statement? And why each item in data.frame should has the level property and how to update the property when the data.frame has been changed.

        Thanks for your generous ;-)

      • Yihui Xie says:

        I guess you have to accumulate experience in R little by little, since R is a really huge statistical programming environment.

  13. Jaffa says:

    Very nice. Q: Is there any way for the SWF call to return a string (via the URL) rather than invoking the URL for the work. So that the calling javascript can itself act based on the word clicked. Ie, put the word clicked into a entry prompt on the screen, without having to do a page reload (via the URL redirect)?

    • Yihui Xie says:

      I don’t know much about Flash; perhaps this is doable via ActionScript, as we can catch the event of mouse-click on an object. But the problem is how to show a web page inside a Flash object — I have no idea on it.

  14. Ali says:

    This is very impressive. Was wondering if you know of any way to actually make the graphics/output from R so that I can click on certain points and either get the underlying data for the point or get a link to the URL.. I have only been able to create static graphs.. i am looking for more drill down or click able capabilities..

    Anyone know how this is done? I am sure if can be done since R is so powerful, but i don’t know how and cannot find any more info on it.

    Thanks.
    Ali

    ps.. sorry if this is not the right place for the question.. I just am impressed with the work and think if you can do this, i am sure you will know how to do what i am looking to do or will know where to point me, Thanks.

    • Yihui Xie says:

      If you want to fulfill such interactions in Flash through R, I do not know any direct way yet, but I know a couple of possibilities: (1) the FlashMXML package (in Omegahat) (2) the swfDevice package (still under development in R-Forge); the latter package has some capabilities for interaction now AFAIK (see demo: http://swfdevice.r-forge.r-project.org/)

      Another possibility is to use SVG which also supports some interactions. R under Linux has the svg() graphics device, but I don’t know if it supports interaction. Add-on packages include SVGAnnotation (Omegahat) and RSVGTipsDevice (on CRAN). You may need special browsers to view the SVG graphics (e.g. Opera, Firefox).

      Yet another possibility is to use R packages that supports dynamic/interactive graphics, such as iplots (needs Java), rggobi (needs GGobi), etc. You can use them just like standalone software packages.

      If you only want to explore your data by yourself, I recommend the third way; but if you want to show some dynamic graphics to other people without special software installed and have to use a common output format like Flash, you may as well create a lot of static graphs beforehand and arrange them in a certain order to illustrate your ideas.

  15. dengyishuo says:

    Very impressive!

  16. Learn English Online says:

    Hello frnd …. Thanks for sharing this .. but I hv problem on my site i could’t change my tag cloud bgcolour Its white & I want to convert in green so please visit my site so exactly u can understand my problem ….& please if u hv any solution mail me or drop massage on my website please…please…. my site is http://learn-english.co.in

  17. Tal Galili says:

    Very cool post Yihui, thank you!

    Tal

  18. shuo says:

    Hi,

    Thank you for sharing with us this wonderful tool. I was really impressed by how well it generated the tag cloud. There might be a minor issue in your code when dealing with special chars (e.g. ‘&’), though. The R code calls function URLencode to encode the whole tag. However, I found that I had to set ‘reserved=T’ for URLencode in order to encode ‘&’. Perhaps that should be included in tagCloud?

Leave a Reply

(required)

(required)

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2010 by Yihui Xie