Yihui Xie

Two Roads Diverged in a Wood; Take One without Blocking the Other

Yihui Xie / 2018-11-29


When two roads diverge in a yellow wood, you often have to take one as a traveler. In software development, we often face such difficult decisions as developers. Sometimes one road may seem to be “obviously” better than the other, so you make the decision for your potential users, and only support that road.

The only problem is “obviously”. Obviousness is in the eye of the beholder.

When designing the knitr package, I thought it was obvious that the working directory when running a code chunk should be the parent directory of the input file, instead of whatever the current getwd() is. This decision turned out to be the one that users hated most. I think I can still defend my decision, but I can also see their point and cannot deny the confusion caused by this decision. Anyway, I didn’t block the other way.

To Docker or not to Docker

Recently, Patrick, one of the contributors of xaringan, wrote a function decktape() to export slides to PDF via DeckTape, which looked like this originally:

decktape = function(file, output) {
  args = shQuote(c(file, output))
  system2('docker', c(
    'run', '--rm', '-t', '-v', '`pwd`:/slides', '-v',
    '$HOME:$HOME', 'astefanutti/decktape', args
  ))
}

He liked Docker, and I could definitely see the benefits of Docker. DeckTape can run with or without Docker, like any other software packages, but DeckTape is a NodeJS package, so you know… The lovely dependency hell there…

When I saw Patrick’s function, I decided to make it work without Docker, too. The reason was that the alternative way was trivially easy to implement. For conciseness, the four dots .... below indicates identical code in the above function so that you can easily see the extra code I added:

decktape = function(...., docker = Sys.which('decktape') == '') {
  ....
  if (docker) system2(....) else system2('decktape', args)
}

That is low cost in implementation, but high return in usability. There might be users who’d rather deal with the Node dependency hell instead of pulling a Docker image of several hundred megabytes.1 It is not necessary to block their way or convince them why Docker is the ultimate best solution.

BTW, note that the default docker = Sys.which('decktape') == '' means if the user has chosen to install decktape locally and it can be found via the PATH variable, xaringan::decktape() will not use Docker (i.e., docker = FALSE). The default should reflect the user’s intention reasonably well. If the guess is incorrect, they can manually specify docker = TRUE or FALSE.

From File or not from File

Here is another example of two roads in a wood: jeroen/jsonlite#265. I don’t mean to criticize the bestest ninja in the R community (i.e., Jeroen), but this example is very representative in my eyes. I want to talk about it so others can learn from it. Basically, when there is a function that is supposed to deal with text data, it is common that the first argument of the function can be either the text or a file path (or a URL, etc.). A common trick is like this:

process_text = function(x) {
  if (file.exists(x)) x = readLines(x)
  paste(x, collapse = '\n')
}

So both process_text('foo/bar.txt') and process_text(c('a', 'b')) will work. This looks very neat, but the problem is, what if the input x is literally a character string that only happens to be a real file path?2 And what if the user entered a wrong file path by accident?3 The cleverness will bite us in these cases. What I often do is this:

process_text = function(x, content = readLines(x)) {
  paste(content, collapse = '\n')
}

It provides both ways to users. They can process either a text file, or a character string. They know more about their data, and we don’t have to guess for them.

process_text('foo/bar.txt')
# if foo/bar.txt is not meant to be a file path
process_text(content = 'foo/bar.txt')
# signal an error if the file path is wrong
process_text('foo/baz.txt')

Actually jsonlite::fromJSON() is cleverer than that in guessing. It has multiple rules to decide whether the input is text or a file path, but no matter how clever it is, it still can backfire if the user doesn’t have a way to tell it if the input is text or a file path, because guessing, is guessing, after all.

# what if there is really a file named [1]?
xfun::in_dir(tempdir(), {
  f = '[1]'
  writeLines('[1, 2, 3]', f)
  on.exit(file.remove(f))
  jsonlite::fromJSON(f)  # expecting c(1, 2, 3) but returns c(1)
})

FWIW, Jeroen has added a new function jsonlite::parse_json() that only deals with text input, which is great!

In short, we could absolutely love one road, or be clever on one road. Just make sure you have thought about whether it makes sense to block the other road.

This way or that way? Whichever you like.


  1. Well, if you npm install -g decktape, you may also end up with using a lot of your disk space because of the way npm manages packages.
  2. The function should return the character string, instead of reading the file.
  3. The function should signal an error instead of returning the character string of the wrong file path.