Yihui Xie http://yihui.name 2014-03-12T22:25:28-07:00 xie@yihui.name Markdown or LaTeX? http://yihui.name/en/2013/10/markdown-or-latex/ 2013-10-19T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/10/markdown-or-latex

What happens if you ask for too much power from Markdown?

R Markdown is one of the document formats that knitr supports, and it is probably the most popular one. I have been asked many times about the choice between Markdown and LaTeX, so I think I'd better wrap up my opinions in a blog post. These two languages (do you really call Markdown a language?) are kind of at the two extremes: Markdown is super easy to learn and type, but it is primarily targeted at HTML pages, and you do not have fine control over typesetting ( really? really?), because you only have a very limited number of HTML tags in the output; LaTeX is relatively difficult to learn and type, but it allows you to do precise typesetting (you have control over anything, and that is probably why a lot of time can be wasted).

What is the problem?

What is the root problem? I think one word answers everything: page! Why do we need pages? Printing is the answer.

In my eyes, the biggest challenge for typesetting is to arrange elements properly with the restriction of pages. This restriction seems trivial, but it is really the root of all "evil". Without having to put things on pages, life can be much easier in writing.

What is the root of this root problem in LaTeX? One concept: floating environments. If everything comes in a strictly linear fashion, writing will be just writing; typesetting should be no big deal. Because a graph cannot be broken over two pages, it is hard to find a place to put it. By default, it can float to unexpected places. The same problem can happen to tables (see the end of a previous post). You may have to add or delete some words to make sure they float to proper places. That is endless trouble in LaTeX.

There is no such a problem in HTML/Markdown, because there is no page. You just keep writing, and everything appears linearly.

Can I have both HTML and PDF output?

There is no fault being greedy, and it is natural to ask the question whether one can have both HTML and PDF output from a single source document. The answer is maybe yes: you can go from LaTeX to HTML, or from Markdown to LaTeX/PDF.

  • pandoc can convert Markdown to almost anything
  • many tools to convert LaTeX to HTML

But remember, Markdown was designed for HTML, and LaTeX was for PDF and related output formats. If you ask for more power from either language, the result is not likely be ideal, otherwise one of them must die.

How to make the decision?

If your writing does not involve complicated typesetting and primarily consists of text (especially no floating environments), go with Markdown. I cannot think of a reason why you must use LaTeX to write a novel. See Hadley's new book Advanced R programming for an excellent example of Markdown + knitr + other tools: the typesetting elements in this book are very simple -- section headers, paragraphs, and code/output. That is pretty much it. Eventually it should be relatively easy to convert those Markdown files to LaTeX via Pandoc, and publish a PDF using the LaTeX class from Chapman & Hall.

For the rest of you, what I'd recommend is to think early and make a decision in the beginning; avoid having both HTML and PDF in mind. Ask yourself only one question: must I print the results nicely on paper? If the answer is yes, go with LaTeX; otherwise just choose whatever makes you comfortable. The book Text Analysis with R authored by Matthew Jockers is an example of LaTeX + knitr. Matt also asked me this question about Markdown vs LaTeX last week while he was here at Iowa State. For this particular book, I think Markdown is probably OK, although I'm not quite sure about a few environments in the book, such as the chapter abstracts.

It is not obvious whether we must print certain things. I think we are just too used to printing. For example, dear professors, must we print our homework? (apparently Jenny does not think so; I saw her grade homework on RPubs.com!) Or dear customers, must we submit reports in PDF? ... In this era, you have laptops, iPad, Kindle, tablets and all kinds of electronic devices that can show rich media, why must you print everything (in black and white)?

For those who are still reading this post, let me finish with a side story: Matt, a LaTeX novice, taught himself LaTeX a few months ago, and he has finished the draft of a book with LaTeX! Why are you still hesitating about the choice of tools? Shouldn't you just go ahead and get the * done? Although all roads lead to Rome, some people die at the starting line instead of on the roads.

]]>
After Three Months I Cannot Reproduce My Own Book http://yihui.name/en/2013/09/cannot-reproduce-my-own-book/ 2013-09-05T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/09/cannot-reproduce-my-own-book

I thought I could easily jump to a high standard (reproducibility), but I failed.

Some of you may have noticed that the knitr book is finally out. Amazon is offering a good price at the moment, so if you are interested, you'd better hurry up.

I avoided the phrase "Reproducible Research" in the book title, because I did not want to take that responsibility, although it is related to reproducible research in some sense. The book was written with knitr v1.3 and R 3.0.1, as you can see from my sessionInfo() in the preface.

Three months later, several things have changed, and I could not reproduce the book, but that did not surprise me. I'll explain the details later. Here I have extracted the first three chapters, and released the corresponding source files in the knitr-book repository on Github. You can also find the link to download the PDF there. This repository may be useful to those who plan to write a book using R.

What I could not reproduce were not really important. The major change in the recent knitr versions was the syntax highlighting commands, e.g. \hlcomment{} is \hlcom{} now, and the syntax highlighting has been improved by the highr package (sorry, Romain). This change brought a fair amount of changes when I look at git diff, but these are only cosmetic changes.

I tried my best to avoid writing anything that is likely to change in the future into the book, but as a naive programmer, I have to say sorry that I have broken two little features, although they may not really affect you:

  • the preferred way to stop knitr in case of errors is to set the chunk option error = FALSE instead of the package option stop_on_error, which has been deprecated (Section 6.2.4);
  • for external code chunks (Section 9.2), the preferred chunk delimiter is ## ---- instead of ## @knitr now;

Actually the backward-compatibility is still there, so they will not really break until a long time later.

With exactly the same software environment, I think I can reproduce the book, but that does not make much sense. Things are always evolving. Then there are two types of reproducible research:

  1. the "dead" reproducible research (reproduce in a very specific environment);
  2. the reproducible research that evolves and generalizes;

I think the latter is more valuable. Being reproducible alone is not the goal, because you may be reproducing either real findings or simply old mistakes. As Roger Peng wrote,

[...] reproducibility cannot really address the validity of a scientific claim as well as replication

Roger's recent three blog posts on reproducible research are very worth reading. This blog post of mine is actually not quite relevant (no data analysis here), so I recommend my readers to move over there after you haved checked out the knitr-book repository.

]]>
My first Bioconductor conference (2013) http://yihui.name/en/2013/07/bioconductor-2013/ 2013-07-21T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/07/bioconductor-2013 The BioC 2013 conference was held from July 17 to 19. I attended this conference for my first time, mainly because I'm working at the Fred Hutchinson Cancer Research Center this summer, and the conference venue was just downstairs! No flights, no hotels, no transportation, yeah.

Last time I wrote about my first ENAR experience, and let me tell you why the BioC conference organizers are smart in my eyes.

A badge that never flips

I do not need to explain this simple design -- it just will not flip to the damn blank side:

The conference program book

The program book was only four pages of the schedule (titles and speakers). The abstracts are online. Trees saved.

Lightning talks

There were plenty of lightning talks. You can talk whatever you want.

Live coding

On the developer's day, Martin Morgan presented some buggy R code to the audience (provided by Laurent Gatto), and asked us to debug it right there. Wow!

Everything is free after registration

The registration includes almost everything: lunch, beer, wine, coffee, fruits, snacks, and most importantly, Amazon Machine Instances (AMI)!

AMI

This is a really shiny point of BioC! If you have ever tried to do a software tutorial, you probably know the pain of setting up the environment for your audience, because they use different operating systems, different versions of packages, and who knows what is going to happen after you are on your third slide. At a workshop last year, I had the experience of spending five minutes figuring out why a keyboard shortcut did not work for one Canadian lady in the audience, and it turned out she was using the French keyboard layout.

The BioC organizers solved this problem beautifully by installing the RStudio server on AMI. Every participant was sent a link to the Amazon virtual machine, and all they need is a web browser and wireless connection in the room. All people run R in exactly the same environment.

Isn't that smart?

Talks

I do not really know much about biology, although a few biological terms have been added to my volcabulary this summer. When a talk becomes biologically oriented, I will have to give up.

Simon Urbanek talked about big data in R this year, which is unusual, as mentioned by himself. Normally he shows fancy graphics (e.g. iplots). I did not realize the significance of this R 3.0.0 news item until his talk:

It is now possible to write custom connection implementations outside core R using R_ext/Connections.h. Please note that the implementation of connections is still considered internal and may change in the future (see the above file for details).

Given this new feature, he implemented the HDFS connections and 0MQ-based connections in R single-handedly (well, that is always his style).

You probably have noticed the previous links are Github repositories. Yes! Some R core members really appreciate the value of social coding now! I'm sure Simon does. I'm aware of other R core members using Github quietly (DB, SF, MM, PM, DS, DTL, DM), but I do not really know their attitude toward it.

Joe Cheng's Shiny talk is shiny as usual. Each time I attend his talk, he will show a brand new amazing demo. Joe is the only R programmer that makes me feel "the sky is the limit (of R)". The audience were shocked when they saw a heatmap that they were so familiar with suddently became interactive in a Shiny app! BTW, Joe has a special sense of humor when he talks about an area in which he is not an expert (statistics or biology).

RStudio 0.98 is going to be awesome. I'm not going to provide the links here, since it is not released yet. I'm sure you will find the preview version if you really want it.

Bragging rights

  • I met Robert Gentleman for the first time!
  • I dare fall asleep during Martin Morgan's tutorial! (sorry, Martin)
  • some Bioconductor web pages were built with knitr/R Markdown!

Next steps

Given Biocondutor's open-mindedness to new technologies (GIT, Github, AMI, Shiny, ...), let's see if it is going to take over the world. Just kidding. But not completely kidding. I will keep the conversation going before I leave Seattle around mid-August, and get something done hopefully.

If you have any feature requests or suggestions to Bioconductor, I will be happy to serve as the "conductor" temporarily. I guess they should set up a blog at some point.

]]>
R Package Versioning http://yihui.name/en/2013/06/r-package-versioning/ 2013-06-27T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/06/r-package-versioning This should be what it feels like to bump the major version of your software:

bump the major version

For me, the main reason for package versioning is to indicate the (slight or significant) differences among different versions of the same package, otherwise we can keep on releasing the version 1.0.

That seems to be a very obvious fact, so here are my own versioning rules, with some ideas borrowed from Semantic Versioning:

  1. a version number is of the form major.minor.patch (x.y.z), e.g., 0.1.7
  2. only the version x.y is released to CRAN
  3. x.y.z is always the development version, and each time a new feature or a bug fix or a change is introduced, bump the patch version, e.g., from 0.1.3 to 0.1.4
  4. when one feels it is time to release to CRAN, bump the minor version, e.g., from 0.1 to 0.2
  5. when a change is crazy enough that many users are presumably going to yell at you (see the illustration above), it is time to bump the major version, e.g., from 0.18 to 1.0
  6. the version 1.0 does not imply maturity; it is just because it is potentially very different from 0.x (such as API changes); same thing applies to 2.0 vs 1.0

I learned the rule #3 from Michael Lawrence (author of RGtk2) and I think it is a good idea. In particular, it is important for brave users who dare install the development versions. When you ask them for their sessionInfo(), you will be aware of which stage they are at.

Rule #2 saves us a little bit energy in the sense that we do not need to write or talk about the foo package 1.3.548, which is boring to type or speak. Normally we say foo 1.3. As a person whose first language is not English, speaking the patch version does consume my brain memory and slows down my thinking while I'm talking. When I say it in Chinese, I feel boring and unnecessarily geeky. Yes, I know I always have weird opinions.

]]>
You Do Not Need to Tell Me I Have A Typo in My Documentation http://yihui.name/en/2013/06/fix-typo-in-documentation/ 2013-06-10T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/06/fix-typo-in-documentation help me with Github pull requests

So I just got yet yet another comment saying "you have a typo in your documentation". While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked -- just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

  1. click the Fork button and clone the repository in your own account;
  2. make the changes in your cloned version;
  3. push to your repository;
  4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I'm still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, ...)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let's see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

]]>
A Few Tips for Writing an R Book http://yihui.name/en/2013/06/tips-for-writing-an-r-book/ 2013-06-03T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/06/tips-for-writing-an-r-book I just finished fixing (hopefully all) the problems in the knitr book returned from the copy editor. David Smith has kindly announced this book before I do. I do not have much to say about this book: almost everything in the book can be found in the online documentation, questions & answers and the source code. The point of buying this book is perhaps you do not have time to read through all the two thousand questions and answers online, and I did that for you.

the knitr book

This is my first book, and obviously there have been a lot for me to learn about writing a book. In retrospect, I want to share a few tips that I found useful (in particular, for those who plan to write for Chapman & Hall):

  1. although it sounds like shameless self-promotion, using knitr made it a lot easier to manage R code and its output for the book; for example, I could quickly adapt to R 3.0.1 from 2.15.3 after I came back from a vacation; if I were to write a second edition, I do not think I will have big trouble with my R code in the book (it is easy to make sure the output is up-to-date);
  2. I put my source documents under version control, which helped me watch the changes in the output closely; for example, I noticed the source code of the function fivenum() in base R was changed from R 2.15.3 to 3.0.0 thanks to GIT (R core have been updating base R everywhere!);
  3. (opinionated) some people might be very bored to hear this: use LyX instead of plain LaTeX... because you are writing, not coding; LaTeX code is not fun to read...
  4. for the LaTeX document class krantz.cls (by Chapman & Hall):
    • to solve the only stupid problem in LaTeX (i.e., floating environments float to silly places by default), use something like this \renewcommand{\textfraction}{0.05} \renewcommand{\topfraction}{0.8} \renewcommand{\bottomfraction}{0.8} \renewcommand{\floatpagefraction}{0.75}

      I'm aware of the float package and the H option, and options like !tbp; I just do not want to force LaTeX to do anything. It may or may not be happy at some point.

    • put \usepackage{emptypage} in the preamble to make empty pages really empty, as required by the copy editor.
    • the document class krantz.cls does not work with the hyperref package, meaning that you cannot create bookmarks in the PDF; I have posted the solution here.
  5. for authors whose native language is not English like me, here is a summary of my problems in English:
    • when you want to use which, use that instead, unless there is a comma ahead, or you really want to emphasize a very specific object; e.g.,

      "here is a package that is helpful" (correct)

      "here is a package which is helpful" (wrong)

      "we will introduce an extremely important technology next, which has revolutionized the life of poor statisticians"

    • it is "A, B, and C" instead of "A, B and C"

    • do not forget the comma in other places, either: "e.g.,", "i.e.,", "foo and bar, respectively"; actually, try to use the comma whenever possible to break long sentences into shorter pieces
  6. for the plots, use the cairo_pdf() device when possible; in knitr, this means the chunk option dev = 'cairo_pdf'; the reason for the choice of cairo_pdf() over the normal pdf() device is that it can embed fonts in the PDF plot files, otherwise the copy editor will require you to embed all the fonts in the final PDF file of the book; normally pdflatex will embed fonts, and if there are fonts that are not embedded, it is very likely that they are from R graphics;
  7. include as many figures as possible (I have 51 figures in this 200-page book), because this will make the number of pages grow faster (I'm evil) so that you will not feel frustrated, and the readers will not fall into the hell of endless text, just pages after pages;
  8. prepare an extra monitor for copyediting;
  9. learn a little bit about pdftk, because you may want to use it finally, e.g., replace one page with a blank page in the frontmatter;
  10. learn these copy editing symbols (thanks, Matt Shotwell);

One thing I did not really understand was the punctuation marks like commas and periods should go inside quotation marks, e.g.,

I have "foo" and "bar."

This makes me feel weird. I'm more comfortable with

I have "foo" and "bar".

There was also one thing that I did not catch by version control -- one figure file went wrong and I did not realize it, because normally I do not put binary files under version control. Fortunately, I caught it by my eyes. Karl Broman mentioned the same problem to me a while ago. I know there are tools for comparing images (ImageMagick, for example), and I was just too lazy to learn them.

I will be glad to know the experience of other authors, and will try to update this post according to the comments.

]]>
Travis CI for R! (not yet) http://yihui.name/en/2013/04/travis-ci-general-purpose/ 2013-04-12T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/04/travis-ci-general-purpose A few days ago I wrote about Travis CI, and was wondering if we could integrate the testing of R packages into this wonderful platform. A reader (Vincent Arel-Bundock) pointed out in the comments that Travis was running Ubuntu that allows you to install software packages at your will.

I took a look at the documentation, and realized they were building and testing packages in virtual machines. No wonder sudo apt-get works. Remember apt-get -h | tail -n1:

This APT has Super Cow Powers. (APT有超级牛力)

R on Travis CI

Now we are essentially system admins, and we can install anything from Ubuntu repositories, so it does not really matter that Travis CI does not support R yet. Below are a few steps to integrate your R package (on Github) into this system:

  1. follow the official guide util you see .travis.yml;
  2. copy my .travis.yml for the knitr package if you want, or write your own;
    • I use a custom library path ~/R to install add-on R packages so that I do not have to type sudo everywhere
    • at the moment I use the RDev PPA by Michael Rutter to install R 3.0.0 since his plan for R 3.0 on CRAN is in May; at that time I'll change this PPA to a CRAN repository
    • since R CMD check requires all packages in Suggests as well, I install knitr using install.packages(dep = TRUE) to make sure all relevant packages are installed
    • make install and make check are wrappers of R CMD build and R CMD check respectively, defined in the Makefile
  3. push this .travis.yml to Github, and Travis CI will start building your package when a worker is available (normally within a few seconds);

By default you will receive email notifications when there are changes in the build. You can also find the guide on the build status image in the documentation as well, e.g. Build Status

What I described here actually applies to any software packages (not only R), as long as the dependencies are available under Ubuntu, or you know how to build them.

But it is still far from CRAN

OK, it works, but we are still a little bit far from what CRAN does, because Travis CI does not have official support for R. Each time we have to install one Gigabyte of additional software to create the R testing environment (sigh, if only R did not have to tie itself to LaTeX). If these packages are pre-built in the virtual machines, it will save us a lot of time.

The second problem is, there is no Windows support on Travis CI (one developer told us on Twitter that it was coming). There is a page for OS X, but I did not really figure out how to build software under OS X there.

The third problem is Travis CI only builds and tests packages; it does not provide downloads like CRAN. Perhaps we can upload the packages using encryption keys to our own servers.

R-Forge, where are you going?

I will shut up here since I realized I was not being constructive. Let me spend more time thinking about this, and I love to hear suggestions from readers as well.

So, two potential Google Summer of Code projects:

  • make R an officially supported language on Travis CI (this really depends on if the Travis team want it or not)
  • improve R-Forge (of course this depends on if the R-Forge team think they need help or not)
]]>
Travis CI for R? http://yihui.name/en/2013/04/travis-ci-for-r/ 2013-04-07T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/04/travis-ci-for-r I'm always worried about CRAN: a system maintained by FTP and emails from real humans (basically one of Uwe, Kurt or Prof Ripley). I'm worried for two reasons:

  1. the number of R packages is growing exponentially;
  2. time and time again I see frustrations from both parties (CRAN maintainers and package authors);

I have a good solution for 2, which is to keep silent when your submission passes the check system, and say "Sorry!" no matter if you agree with the reason or not when it did not pass (which made one maintainer unhappy), but do not argue -- just go back and fix the problem if you know what is the problem; or use dark voodoo to hide (yes, hide, not solve) the problem if you are sure you are right. If you read the mailing list frequently, you probably remember that if (CRAN) discussion. The solution in my mind was if (Sys.getenv('USER') == 'ripley').

The key is, do not argue. Silence is gold.

You shall not pass

The CRAN maintainers have been volunteering their time, and we should respect them. The question is, will this approach scale well with the growth of packages? Or who should be in charge of R CMD check?

We, the poor authors, cannot guaranttee that every time our packages can pass CRAN's machines due to all kinds of reasons. Some problems are actually easy to fix without a real human yelling at us. On the other hand, if the package fortunately passes R CMD check, we do not really need an email from a real human acknowledging "thanks, on CRAN now".

Travis CI is an excellent platform for continuous integration of software packages. You do not need to interact with a real person by email -- each time you push to Github, your package will be automatically built and checked. If there are problems, you will be notified automatically.

A similar platform in the R world is Bioconductor. It has the best two components in software development: version control (although sadly SVN) and continuous checking. I do not know if CRAN will catch up one day. I'm not very optimistic about it; perhaps a more realistic approach is to start a Google Summer of Code project on introducing R into Travis CI. I have no idea how difficult that will be, but I will definitely be thrilled if it comes true this year.

Anyone?

Update on 04/16/2013: just to clarify, what Bioconductor does is not strictly continuous integration (yet) in the sense that it builds packages daily instead of immediately on changes.

]]>
On ENAR, or Statistical Meetings in General http://yihui.name/en/2013/03/on-enar-or-statistical-meetings-in-general/ 2013-03-14T00:00:00-07:00 Yihui Xie http://yihui.name/en/2013/03/on-enar-or-statistical-meetings-in-general Last year I accepted an invitation from Ben to go to ENAR 2013 -- my first ENAR. I used to go to JSM and useR!, and apparently I enjoy useR! most. The reason is not, or not only, because I'm more of a technical person. It is just hard to concentrate at large statistical conferences. I want to make a few suggestions from the perspective of a student, although it is unlikely that any future conference chairs will come here and listen to me:

  1. Go green and get rid of printed programs. A program book is thick and clunky and nobody will take it with them when they leave. The hundreds of pages of paper will only end up in garbage. If you have to print them, print N/5 copies instead (N is the number of participants) and let the participants share with each other.
  2. Improve the websites and add social network features. For example, we can "reserve" the talks which we are interested, so all participants immediately know which are the popular and highly expected talks, and organizers can schedule appropriate rooms. The discussion session by John Chambers, Duncan Temple Lang, Thomas Lumley and Michael Lawrence at the last JSM in San Diego was a failure in the sense that many people were standing outside of the room (presumably fans of John Chambers). By comparison, my session at ENAR was assigned to a room of (more than?) 400 seats but only 20 people showed up.
  3. If we tell the organizers which sessions we plan to go, the program book can be a lot thinner! If there are changes in these sessions, only a small group of people need to be notified. If you notify everybody by inserting a few pieces of announcements, it is a waste of time of most people.
  4. One thing I really wish to have for conferences is I want to know which people in my "circle" are also going there. It is hard to go through 1000 names of participants and spot some familiar names. I met a friend at ENAR who collaborated with me (translating an English R tutorial to Chinese) almost 10 years ago but we have never really met in person so we do not know each other. I did not know he was coming as well. I was just sitting on the sofa in a corner and he randomly saw my badge. We were so excited that we finally met in such an unexpected place.
  5. If participants can make connections with each other beforehand, it is likely to save us money as well -- we can share the costs when we rent cars, take cabs, book hotels and so on.
  6. So please do not charge us upon registration -- give us a deadline and charge us later. Perhaps I will change my mind later if I cannot find enough interesting people to meet, or the popular talks seem irrelevant to me.
  7. I heard from Hadley that some IEEE conferences require the speakers to do a 30-second talk before the conferences, and I think that will be cool and useful for statistical conferences as well. Nowadays I still hear certain speakers read their slides word by word. Some speakers may be shy or are not confident in their oral English, but I do not think the language problem is a really big problem. My suggestion to these speakers is to spend more time preparing jokes instead of the slides: jokes make the audience concentrate and speakers relax. I have told a lot of stupid jokes that I regret afterwards, but I think the net effect is still positive.
  8. If it is not possible to arrange 30-second talks, the conference website should allow speakers upload mini versions of their talks to attract more audience.
  9. Some people go to conferences for both presentations and sightseeing. Personally I do not care about the latter at all, but unfortunately all big conferences take place in famous big cities. This ENAR was held in the largest Marriott in the world. Am I proud of that? No, not at all, because I had to live in a much cheaper hotel three miles away. One evening I tried to walk back and it took me one hour and twenty minutes. What is more, this place is not really walkable -- I had to walk on the grass on the roadside for half an hour because there was no pavement! I do not really mind walking (even for three miles), but it is not a happy memory walking on the grass. The Marriott was such a closed universe that it was hard to walk out, as Karl's picture shows below:

By comparison, useR! conferences often take place in a university campus. Last year it was in Vanderbilt, and they provided dorms to students. I lived happily in the dorm, because all I wanted was a place to sleep (there was free wireless too); nothing luxury. Usually there are also inexpensive restaurants on campus. I met helpful local students/researchers there who gave me free ride to some scenic spots, and I had to fight to pay my tickets by myself (but was still treated). It was just a touching trip and I managed to make acquaintance quite a few interesting people (Frank Harrell, Bill Venables and Kevin Coombes, etc).

ENAR "kindly" included a ticket for the Epcot theme park in the registration fee. What did I do in the theme park? I had a dinner in an expensive restaurant (Karl felt guilty for taking me there and generously treated me), and watched a 10-minute fireworks show. Yes, we were so nerdy that we kept on discussing the role of measure theory and Github in a place where we were supposed to say hello to the Mickey mouse.

So my major suggestion to the big statistical meetings is, create an environment which emphasizes the communication among people and do not include distracting activities in the registration package by default. There are a couple of small things you can do, for example:

  1. More seats in the open place so we can sit and chat.
  2. Free beer. I'm sure it is more doable than an Epcot ticket.
  3. Always print the participant name on both sides of the badge. You know how stupid it is to show a blank side of your badge to other people (and the badge always flips to the damn wrong side, always, flips!!), especially given that statisticians are socially awkward and feel embarassed to ask other people for names.
  4. Choose a university campus instead of Marriott. If you do not know how to choose one, choose Iowa State University then. I'm sure all participants will be highly concentrated unless they are interested in seeing corns and pigs on the farms.

Okay, rants ended. Positive energy coming.

As I said on Twitter, I was happy to meet Karl Broman (who introduced Matthew Stephens to me later, the smartest person in statistics and human genetics according to Karl) and John Muschelli there. I noticed Karl long time ago, mainly due to Top ten worst graphs, but have never met him.

I did not know much about the Johns Hopkins biostat department before I visited them last year, and it has become a place that surprises me more and more. It is a weird and crazy department. I like people for bizarre reasons. For instance, I like Jeff Leek because he prefers steak to be well done (me too). Rafa hides jokes under his very serious-looking face. "Behind the Tan Door" is the best video in the history of statistics. Karl has a series of hilarious stories about the JHSPH logo. There are people in the world that you know for sure you will be excited to meet. Currently I have yet another person on my list: Tyler Rinker.

In case you have not seen it, I strongly recommend you read (and apply for) the postdoctoral fellow position in reproducible research at Hopkins. Note in particular the phrase "serious moxie"!!

So do I regret ENAR? Certainly no. If I can organize my time more efficiently meeting more people like those above, it will be even better. BTW, if you are not the 20 people in my session, feel free to check out my slides on knitr (Brian Bot simply called them "animated gifs" instead of "slides").

]]>
Contribute to The R Journal with LyX/knitr http://yihui.name/en/2013/02/contribute-to-the-r-journal-with-lyx-knitr/ 2013-02-17T00:00:00-08:00 Yihui Xie http://yihui.name/en/2013/02/contribute-to-the-r-journal-with-lyx-knitr (This paragraph is pure rant; feel free to skip it) I have been looking forward to the one-column LaTeX style of The R Journal, and it has arrived eventually. Last time I mentioned "it does not make sense to sell the cooked shrimps"; actually there is another thing that does not make sense in my eyes, which is the two-column LaTeX style. I just hate it. Two-column may save a little bit space in typesetting compared to one-column, but it brings huge inconvenience to the readers who do not have a big enough screen. For each single page, you have to scroll down to read the left column, scroll back and up to read the right column, then scroll down... So you just scroll up and down, up and down, ... until you are bored by this PITA.

A sample R Journal article in LyX/knitr

I have ported the new RJournal.sty to LyX, and you can find the relevant files in my lyx repository. To write articles in LyX with knitr, check out or download the repository and follow these steps:

  1. Find out what is your User directory from the LyX menu: Help --> About
  2. From my repository, copy the layouts folder to your user directory;
  3. Download RJournal.sty from the R Journal website and put it in your texmf tree so that LaTeX can find it (this might be the most challenging step if you do not know enough about LaTeX, and I do not want to explain this painful topic);
  4. (For Windows users only) make sure R is in your PATH (again this is a painful topic that I hate to explain) and install.packages('knitr') in R;
  5. From LyX, click Tools --> Reconfigure and restart LyX;

Now you should be able to open templates/RJournal.lyx and compile it. I have made a quick video of the process below:

So you have no execuse to escape reproducible research! It is even easier than writing in Word to contribute a reproducible article to The R Journal now.

P.S. I will try to submit this new layout file RJournal.layout as well as the template RJournal.lyx to the LyX development team if I do not hear any problems from users.

]]>