pandoc() in knitr (since version 1.2) was designed to convert Markdown documents to other formats such as LaTeX/PDF, HTML and Word (odt/docx). The main idea is to minimize the command-line call by wrapping commands into a configuration file or embedded configurations. Normally we call Pandoc via command line like this:
pandoc -s --mathjax --number-sections --bibliography=foo.bib -o output.html input.md
It is tedious to type the same command again and again. What
pandoc() does is to execute the command like above in R via
system(), but read the Pandoc arguments from a config file, so that we can write all arguments in the file once, and simply call
Follow the instructions on the Pandoc website to install it, and it will be ready to use.
If you have no experience using the command line, you can try this function without any configurations. Write a Markdown file, say,
foo.md, and throw it into
library(knitr) pandoc('foo.md', format='html') # HTML pandoc('foo.md', format='latex') # LaTeX/PDF pandoc('foo.md', format='docx') # MS Word pandoc('foo.md', format='odt') # OpenDocument
But you often need some custom options like what we showed in the beginning. Now we explain how to pass such options to Pandoc.
A simple example
Suppose you want to convert Markdown to HTML with arguments in the first command line example, you can write a config file like this (
<empty> here means you should leave this option empty):
format: html s: <empty> mathjax: <empty> number-sections: <empty> bibliography: foo.bib o: output.html
You can save it as
foo.txt and run
library(knitr) pandoc('input.md', format='html', config='foo.txt')
Then knitr will parse this config file and turn it into pandoc arguments. The empty options such as
mathjax are turned to
--mathjax respectively, and those non-empty options like
o are converted to
-o output.html respectively.
The config file
The config file is essentially a Debian Control File. Here are some rules:
- the option name and value are separated by
- an option can have a value of multiple lines but all the following lines have to be indented by white spaces
- blank lines are used to separate records (paragraphs)
The first rule is simple. For the second rule, consider
bibliography: when there are multiple bibliography databases to be passed to Pandoc, we can write the config file as
bibliography: paper1.bib paper2.bib paper3.bib
In this case, it is converted to
--bibliography=paper1.bib --bibliography=paper2.bib --bibliography=paper3.bib and passed to Pandoc.
For the third rule, it is useful when we define multiple output formats in the config file; below is an example of two records for
format: html s: <empty> mathjax: <empty> number-sections: <empty> bibliography: foo.bib o: output.html format: latex latex-engine: xelatex s: <empty> number-sections: <empty> output: test.pdf
With this config file, we can call
pandoc('input.md', format='latex', config='foo.txt') and we will get a PDF file
The name of the config file is obtained from
getOption('config.pandoc') by default, which means you can set
options(config.pandoc = 'path/to/your/config.file') as a global option. If this option is not set, the
pandoc() function will look for a file
foo is the base name of the input file, e.g. it looks for
test.pandoc if the input file is
test.md. In other words, the config file has the same name as the Markdown file except that it has a different extension.
Sometimes we want to share some a few common options across different output formats. For instance,
--number-sections can be used for both PDF and HTML output. The record that does not contain the
format tag is treated as common options for all formats. Now we can rewrite the above config file as:
s: <empty> number-sections: <empty> format: html mathjax: <empty> bibliography: foo.bib o: output.html format: latex latex-engine: xelatex output: test.pdf
number-sections are extracted to a separate record without a
We may want to make the Markdown file self-contained in the sense that the configurations are embedded in it, so we do not need to rely on an external config file. In this case, we can use a special comment
<!--pandoc --> in the Markdown file.
<!--pandoc format: html s: mathjax: number-sections: bibliography: foo.bib o: output.html -->
Now we can pass a single file to other people and they will be able to call
pandoc() to convert it to the expected format.
If both the config file and embedded configurations are found, they will be combined as if they were from a single file.