Yihui's Blog on Yihui Xie | 谢益辉 https://yihui.org/en/ Recent content in Yihui's Blog on Yihui Xie | 谢益辉 Hugo -- gohugo.io en-US Tue, 31 Jan 2017 00:00:00 +0000 Bye, RStudio/Posit! https://yihui.org/en/2024/01/bye-rstudio/ Wed, 03 Jan 2024 00:00:00 +0000 Yihui Xie https://yihui.org/en/2024/01/bye-rstudio/ Who is down? Me. After more than 10 years at RStudio/Posit, the time has come for me to explore other opportunities. A little over two weeks ago, I was told that I was laid off and my last day would be 2023-12-31. Frankly speaking, I was quite surprised but only for a short moment. I fully respected Posit’s decision, and quickly accepted the conclusion that my contribution no longer deserved a full-time job here. The end of a relationship often does not imply anything wrong or a failure of either party. Instead, it can simply indicate a mismatch, which is normal. People just change. With these amazing years in mind, I left mostly with gratitude in my heart.

# number of days I worked at RStudio/Posit
as.Date('2023-12-31') - as.Date('2013-08-26') + 1

Anyway, I guess this news may surprise some people in the R community and bring up questions or puzzles, so I want to write a blog post to address a few potential questions. If you have more, please feel free to ask me either by comments below or by email.

Acknowledgments

Despite the separation, I hold and value a lot of good memories about RStudio, which I could easily expand into another lengthy blog post, but I will save it for now since I got sick last week and am still recovering. In short, I would like to thank JJ for offering me my first ever full-time job and trusting me for so many years. I thank Joe and Tareef for the long-time mentorship (as well as friendship, I should say). I thank Hadley particularly for the guidance on the bookdown project from 2015 to 2016. I cannot say how much I appreciate Christophe’s help over the years (even a few years before he joined RStudio).

Really, there are too many people that I want to thank in the past ten years. Okay, I will write another post on this in the future after I settle down. As always, I am deeply grateful to the entire R and open source community for their belief and investment in the tools that I have been fortunate to work on.

No, those R packages will not be orphaned

At Posit, I felt blessed to work with super talented and committed engineers, and I believe that our collective work (in particular, R Markdown) has helped make R and reproducible research more accessible to the community and hopefully to the world as a whole.

After my departure from Posit, we are not going to drop these efforts. Posit has generously provided funding for me to continue, as a contractor, to support and extend knitr, rmarkdown, and various packages in this ecosystem. I look forward to continuing my collaboration with the Posit team on our shared areas of interest.

So please do not panic—our existing R packages will still be maintained. The only exception is the DT package, which is not included in the contract, and Posit plans to find a new maintainer for it. Before that happens, I might still be able to help (time permitting), but I cannot promise.

A minimalist has been growing inside me

Over the past three years, I have spent more time thinking and exploring a different approach to building software that is more minimalist and handcrafted than the larger projects like Shiny and Quarto, on which Posit is currently focused. I have become more interested in developing smaller software tools that do fewer things.

To a large extent, I’m leaning towards the “Less is More” or “Worse is Better” philosophy, and I find stoicism and the wabi-sabi concept very appealing. I do not mean my choice is correct or better. All choices are about a series of tradeoffs. I just happen to find a certain choice fits me better. Will I stick to it forever? I do not know.

This philosophical change of mine is not only about software development, but also my daily life. As a result, many friends find it hard to understand me when I ask them not to bring anything but an empty stomach when visiting me—perhaps someone who has visited me before can confirm it in the comments below.

What’s next?

Since I’m no longer a Posit employee, I’m facing some uncertainties now. I need to learn and figure out a few things that are new to me before I can come back to work again. Hopefully this will not take more than a couple of weeks.

The contract work I mentioned above is not enough for me to make a living (well, definitely enough for this minimalist guy but not for the family), so I’m also looking for opportunities that will give me the freedom and flexibility to continue to contribute to the R ecosystem and open source in general. If anyone has a job opening or is interested in my skills, I will be happy to chat, and please feel free to email me.

For now, I have not decided yet whether I want to take a full-time job next or just take this chance to become an independent contractor. It depends on the opportunities that I can get in the next few months.

I have never asked for financial support from the community before, because I have never felt the need (thanks to Posit). Now the situation has become different, and I’m a little concerned about the mortgage number in my account. For the first time, I’m mentioning my Github sponsorship page in my blog: https://github.com/sponsors/yihui. I will be very grateful if anyone could support me for a few months before I transition into the next stable phase of life. I will notify you when I do not need the sponsorship any more so you can cancel it if you are on a monthly tier. I will be happy to offer some casual help in return just as tiny side jobs. For example:

  • Answer your questions (technical or non-technical);

  • Help you optimize your website, or more importantly, cultivate a habit of writing so you can keep writing for the years to come;

  • Advise on how to make your presentation entertaining (but I refuse to sell my precious 20-year old GIF collections);

  • Share my experience on cooking, gardening, badminton, or even setting up a simple Karaoke system at home (now you are highly skeptical if this so-called minimalist gentleman is genuine);

  • As a “down” expert, write a letter to cheer you up if you are down for some reason (since the pandemic, I seem to have become much better at writing letters).

We don’t say goodbye. So actually this is not a “bye” to anyone, but a “hi” to an unknown new journey. I have enjoyed the past decade, and I’m in full curiosity about the future.

]]>
Help Needed: Making SearchBuilder Work in the Server Mode in DT https://yihui.org/en/2023/11/dt-searchbuilder/ Tue, 21 Nov 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/11/dt-searchbuilder/ Three years ago, Xianying Tan helped me add the SearchBuilder extension of DataTables to the DT package. This extension did not exist when I first started developing DT, otherwise I would not have spent countless hours on creating a variety of filters in DT by myself and make them work in both the client and server modes. This extension is much, much more flexible than my clumsy “homemade” filters.

Yes, SearchBuilder has been made available to DT three years ago, but the problem is that it only works on the client side. That is, if you render a table in Shiny, you cannot DT::renderDT(server = TRUE) and can only do server = FALSE. The filtering logic is not implemented on the server side.

Users have been asking how to make it work in the server mode, and one asked again last week. Unfortunately I do not really have time for this task, but I think it is an interesting little project, so I’m sharing tips on how it could be possibly implemented, in the hope that someone could pick it up and get it done. I do not think it is technically hard, but it definitely requires some focus time.

Basically, you need to inspect the object q during debugging the internal function DT:::dataTablesFilter, and you will see parameters sent from SearchBuilder in q$searchBuilder. You need to implement the filters by dealing with these parameters with R code. Here is how you can get started:

debug(DT:::dataTablesFilter)

library(shiny)
shinyApp(
  fluidPage(DT::DTOutput('foo')),
  function(input, output) {
    output$foo = DT::renderDT(
      data.frame(
        a = sample(26), b = letters,
        c = factor(rep(c('a', 'b'), 13)),
        d = Sys.Date() + 1:26,
        e = Sys.time() + 1000 * (1:26)
      ),
      options = list(dom = 'Qlfrtip'),
      extensions = c('SearchBuilder', 'DateTime')
    )
  }
)

In DT:::dataTablesFilter, you can see how I implemented searching, pagination, and sorting with R code. Similar things need to be done for parameters in q$searchBuilder.

Please let me know if you need more guidance. I’m sure at least a few users will be grateful if this could be done, and so will I.

]]>
A Change in the TinyTeX Installation Path on Windows https://yihui.org/en/2023/11/tinytex-path/ Fri, 17 Nov 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/11/tinytex-path/ Since about a month ago, I have been receiving error reports from TinyTeX users saying “TinyTeX\bin\windows\runscript.tlu:864: no appropriate script or program found: fmtutil” or “I can't find the format file `pdflatex.fmt'!”, which I do not understand.

For the last few days, I scratched my head, banged against the wall, did some research, asked R and LaTeX experts in mailing lists, got reminded of “[[alternative HTML version deleted]]” again (and apologized, of course), dug out an old Windows laptop, proudly created a new user account with my authentic Chinese name for the first time of my life (instead of using Pinyin), went through trial and error, learned a variety of bizarreness of Windows batch scripts as well as the Stack Overflow cures, summoned all Chinese students in my alma mater to test their own Windows machines and the Windows servers in their department, and meditated on the meaning of life for three seconds. Finally I’m happy to announce that I have found a fix and applied it to tinytex (the R package) v0.49.

TLDR; The fix

If you have run into the above errors when rendering R Markdown or Quarto or LaTeX documents to PDF, you can install the latest version of tinytex from CRAN:

install.packages('tinytex')

Please remember to restart R after installation. Then make sure packageVersion('tinytex') >= '0.49').

The problem

If your Windows username does not contain spaces or non-ASCII characters, this problem should not affect you.

Sys.getenv('APPDATA')
xfun::is_ascii(.Last.value) && !grepl(' ', .Last.value)

Although I have used LaTeX for nearly two decades, I have learned for the first time (from Akira Kakuto) that TeX Live does not work on Windows when its installation path contains non-ASCII characters. By default, TinyTeX is installed to the path defined by the environment variable APPDATA, which is of the form C:\Users\username\AppData\Roaming. The problem comes from the username in this path, which can contain multibyte characters, and cause TeX Live to fail with a lot of error messages like below:

! warning: kpathsea: configuration file texmf.cnf not found in these directories: 
....
! ...s\username\AppData\Roaming\TinyTeX\bin\windows\runscript.tlu:941: ...s\username\AppData\Roaming\TinyTeX\bin\windows\runscript.tlu:864: no appropriate script or program found: fmtutil
! Running the command C:\Users\username\AppData\Roaming\TinyTeX\bin\windows\fmtutil-user.exe

! kpathsea: Running mktexfmt pdflatex.fmt

! The command name is C:\Users\username\AppData\Roaming\TinyTeX\bin\windows\mktexfmt

In theory, username containing spaces should be fine, because a space is an ASCII character. However, I have received reports that spaces can be trouble, too. I do know why (is this recent bug fix in base R relevant?).

The change

With tinytex v0.49, when your APPDATA path contains spaces or non-ASCII characters:

  • If you run tinytex::install_tinytex() to install TinyTeX for the first time on a computer, it will install TinyTeX to the path defined by the environment variable ProgramData, which is typically C:\ProgramData. This path has no spaces or non-ASCII characters, but note that it is hidden by default (which is harmless). In addition, this folder is shared by all users in the system. If you have multiple users, this could be a problem. For example, other users can change or override your installation. If this is a concern, you can specify a different path via the dir argument of install_tinytex(). Remember this path should not contain special characters, either.

  • If TinyTeX has already been installed to APPDATA, you will get a warning message telling you how to move it to ProgramData (you can also move it to another place if you want—just specify a different path to the to argument below). You may have to restart R or even the system after moving TinyTeX.

tinytex::copy_tinytex(to = Sys.getenv('ProgramData'), move = TRUE)

The installation script install-bin-windows.bat has also been updated accordingly.

A potential flaw

The above fix is based on the assumption that ProgramData is writable, which appears to be true according to various tests that I asked for from some students. If it is not true, you will have to specify your own installation path in tinytex::install_tinytex(), or if you use the Windows batch file, you can set the environment variable TINYTEX_DIR (which defaults to APPDATA or ProgramData).

Quarto users

The command quarto install tinytex is also impacted by this problem on Windows, and I have submitted a similar fix to Quarto. Before it is applied, moving TinyTeX by yourself can also fix the problem. The only issue is that for non-R users, there is not an automatic solution like calling an R function, and you will have to move it manually (then run tlmgr path add and also tlmgr postaction install script xetex if you use XeLaTeX).

If any Windows users run into any issues when installing or moving TinyTeX to the ProgramData folder, please feel free to let me know. Thanks!

]]>
Convert Definition Lists (`<dl>`) to Frames (`<fieldset>`) https://yihui.org/en/2023/11/dl-fieldset/ Tue, 07 Nov 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/11/dl-fieldset/ When I suggested a department chair in 2019 that they may consider opening a blog so that all students and faculty in the department can write together (which sounded exciting to me), he expressed a concern that some readers might remember authors’ mistakes in the posts. Considering the reputation of the whole department, that is a valid concern. However, to err is human (and to forgive is divine). Everyone makes mistakes. I often update my old posts to correct mistakes or write a note saying certain information is oudated after a few years.

My notes did not have a formal form. Sometimes a note may be a blockquote, and sometimes it is just a normal paragraph. Today I was thinking how I could make them more consistent. One constraint is that it would be nice if I could express it in pure Markdown. Unfortunately, CommonMark does not support fenced Divs, otherwise I would use a fenced Div like:

::: Note
This post is outdated. Please ignore it.
:::

Eventually I decided to hack at definition lists. A note will be like:

Update on 2023-11-09

:   Please ignore this post. The method no longer works.

This will render to the HTML tag <dl> (with <dt> and <dd> inside).

Next I started to think about styling, and recalled the <fieldset> tag that I learned sixteen years ago.1 With a few lines of JavaScript, I was able to change <dl> to <fieldset>. Below is a demo:

Notice on 2023-11-07

Thank you for noticing this new notice!

Your noticing it has been noted, and will be reported to the authorities.

If you want to use my JS code, you can load it via:

<script src="https://cdn.jsdelivr.net/npm/@xiee/utils/js/dl-fieldset.min.js" defer></script>

Then you only need to write Markdown:

Title

:   Content

which will be rendered to HTML:

<dl>
  <dt>Title</dt>
  <dd>Content</dd>
</dl>

Then my JS code will convert it to:

<fieldset>
  <legend>Title</legend>
  Content
</fieldset>

Mission complete.


  1. See? This is why you should blog—things you learned, no matter how long ago, will not be wasted. ↩︎

]]>
R Markdown v1: Feature Complete! https://yihui.org/en/2023/10/markdown-complete/ Sat, 21 Oct 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/10/markdown-complete/ When we say “R Markdown”, we usually refer to the rmarkdown package, which is based on Pandoc and knitr. Prior to the rmarkdown package, there actually existed an older version of R Markdown, which was based on the markdown package instead of Pandoc. Later we called this version “R Markdown v1”.

R Markdown v1 was more or less an experiment, although many people liked it (perhaps because they had suffered for too long from LaTeX). It did not take long before we started developing v2, i.e., rmarkdown. V1 was much less powerful than v2. For example, it only supported HTML output but not LaTeX or any other output format. The now widespread CommonMark specs did not exist at that time, so v1’s Markdown syntax was chaos just like pretty much any other Markdown conversion tools (each having its own homemade or wild-caught specs) except Pandoc.

After R Markdown v2 became mature, v1 did not seem to be of much value any more. Perhaps it would just quietly fade out and eventually die. But…

But Jeroen Ooms, the great ninja, created the R package commonmark later. That changed the destiny of the markdown package. Previously, markdown was based on a C library, which had been deprecated for a long time. Last year, I removed the C library from markdown, and rewrote the package based on commonmark.

Although I’m a minimalist, commonmark’s Markdown features are too limited in my eyes. On the other hand, Pandoc’s Markdown is too rich to me. What I did in the markdown package was a compromise. You can read the introduction vignette to learn which features are supported in this package.

If you prefer reading slides over documentation, I have given a talk in May, which was not recorded but you will not miss anything by only reading the slides.

This post is not meant to encourage people to use R Markdown v1. On the contrary, I think v2 and Quarto are better choices for most people. I just want to mention the revived markdown package, and there is a small chance that it actually meets some people’s need.

Declaring “feature complete” is hard, and it is definitely not a firm rejection to all future feature requests. It only means that “being feature-rich” is not the goal of this package. In particular, new features that require substantial work are unlikely to be added. Please feel free to request new features without a high expectation that they would be implemented.

Feature complete!

P.S. Currently, the markdown repo is the only Github repo that I maintain and has zero open issues. For years, I thought Will Landau was the only person on earth who could possibly achieve this.

]]>
A Simple HTML Article Format https://yihui.org/en/2023/10/html-article/ Fri, 20 Oct 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/10/html-article/ A couple of weeks ago I wrote about a lightweight HTML presentation format, snap slides. Today I want to briefly introduce a simple and lightweight HTML article format. You can find an example page here, which is also its documentation. You can generate an article like this with either the R package markdown or Quarto or any other tool that can generate HTML output. Under the hood, this article format is based on CSS and JS, which can be reused on any web pages.

If you use R Markdown or plain Markdown, you can install.packages('markdown') and specify the output format:

output:
  markdown::html_format:
    meta:
      css: ["default", "@xiee/utils/css/article.min.css"]
      js: ["@xiee/utils/js/sidenotes.min.js,appendix.min.js"]

If you use Quarto, you can also include the relevant CSS and JS:

format:
  html:
    minimal: true
    toc: true
    include-after-body:
      text: |
        <script src="https://cdn.jsdelivr.net/npm/@xiee/utils/js/sidenotes.min.js" defer></script>
        <script src="https://cdn.jsdelivr.net/npm/@xiee/utils/js/appendix.min.js" defer></script>
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/combine/gh/rstudio/markdown@1.11/inst/resources/default.min.css,npm/@xiee/utils/css/article.min.css">
        <style type="text/css">header{text-align: center;}</style>

If you use other HTML generators, you can reuse the HTML code in the above Quarto example (in the text field).

The main features of this article format are: side elements (TOC, footnotes, sidenotes, and references), full-width elements, floating quotes, margin embedding, and appendices. Again, you can read its documentation to know more.

]]>
Create Tabsets from HTML Sections or Bullet Lists via JavaScript and CSS https://yihui.org/en/2023/10/section-tabsets/ Thu, 12 Oct 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/10/section-tabsets/ As I wrote last month, code folding was the most requested feature in blogdown, of which I have given an implementation. Today I will demonstrate an implementation of another top requested feature: tabsets.

How a tabset works

The mechanism of tabsets is fairly simple. It boils down to a click event on a tab link, which triggers the display of a corresponding tab pane. The user interface in HTML is like this:

<div class="tabset">
  <div class="tab-link">Tab 1</div>
  <div class="tab-link">Tab 2</div>

  <div class="tab-pane">Pane 1</div>
  <div class="tab-pane">Pane 2</div>
</div>

If the first tab link is clicked, we can add a class, say, active, to both the first link and the first pane.

  <div class="tab-link active">Tab 1</div>
  <div class="tab-pane active">Pane 1</div>

With some simple CSS, we can control the visibility of panes, and style the clicked link differently, e.g.,

.tab-pane {
  display: none;
}
.tab-pane.active {
  display: block;
}
.tab-link.active {
  border: 1px solid;
}

My implementation

There are several existing implementations of tabsets (e.g., in Bootstrap). The problem is that they are usually not tailored to Markdown users, and you have to prepare the appropriate HTML code by yourself. I have done an implementation today that works for both Markdown and HTML users.

You can find the source code tabsets.js and tabsets.css in my Github repo misc.js. For users, you certainly do not need to read the source, but can use it directly:

<script src="https://cdn.jsdelivr.net/npm/@xiee/utils/js/tabsets.min.js" defer></script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@xiee/utils/css/tabsets.min.css">

If you are not satisfied with the styling, you can provide your own CSS and do not have to use my CSS.

Demo tabs

Below is an example tabset.

First tab

Hello world! Ciao!

Second tab

Here is a table.

x y z
1
2
3

More text.

A level-4 heading

Third tab

Nested tabs!

Tab 1

Isn’t it cool? Keep going!

Tab 2

Where am I now?

You can also create a tabset using raw HTML:

Div pane 1

Div pane 2

You can keep nesting but I’ll stop here.

Fourth tab

Enough tabs? Let me show a tabset created from a bullet list instead of section headings:

  • First bullet

    Hi, bullet!

  • Second bullet

    This is the initial active tab.

  • Third bullet

    Bye, bullet!

Okay, I’m done now.

Documentation

HTML users

If you know HTML and prefer writing HTML, the required DOM structure has been mentioned in the first section of this post. Basically, you provide a container element with the class tabset (it does not have to be a <div>). Inside the container, you have a number of elements with the class tab-link, and the same number of elements with the class tab-pane.

When the i-th link is clicked, the i-th pane will be shown. You can set a certain tab link to be active initially by adding the class active to its HTML tag.

Note that you can have nested tabsets, e.g., a tabset inside a tab pane of a parent tabset.

Markdown users

If you prefer writing Markdown to be rendered to HTML by other tools (e.g., Hugo or the R package markdown), here is how you create a tabset:

  1. Start with an element with the class tabset. This can be any type of element. For example, a heading:

    ## Demo tabs {.tabset}
    

    or an empty <div>:

    <div class="tabset"></div>
    
  2. Below this element, write either a bullet list or a series of sections.

    • If you write a bullet list, the first element of each bullet item will become the tab link, and the rest of elements will become the tab pane, e.g.,

      * Tab one
      
        Content of tab one.
      
      * Tab two <!--active-->
      
        Content of tab two.
      

      I’d recommend this method since it is easier and more natural to create a tabset. However, please make sure to indent the tab pane content properly in the bullet list (using the visual mode to write Markdown in RStudio can help a lot).

      To specify an initial active tab, add a comment <!--active--> to the bullet item.

    • If you write sections, the first section heading level will be the level of headings to be converted to tabs, e.g.,

      ### First tab (level-3)
      
      Some tab content.
      
      ### Second tab
      
      More tab content.
      
      #### A normal heading
      
      This is a level-4 heading, so it will *not* be
      converted to a tab.
      

      You can set a certain tab to be active initially by adding the class active to the heading, e.g.,

      ### Second tab {.active}
      

      One downside of using section headings to create a tabset is that the headings may be included in the table of contents of a page, which is why I do not recommend this method, unless you must specify an active tab manually.

  3. If you use sections to create a tabset, there are two ways to end the tabset (if you create a tabset with a bullet list, you do not need a special way to end it—it just ends where the list ends):

    1. Either start a upper-level heading (e.g., level 2 for the previous example), e.g.,

      ## My tabs {.tabset}
      
      ### Tab one
      
      ### Tab two
      
      ## A new level-2 section
      
      The previous tabset will be ended before this section.
      
    2. or write an HTML comment of the form <!-- tabset:ID -->, where ID is the ID of the element in Step #1, e.g.,

      ## My tabs {.tabset #my-tabs}
      
      ...
      
      <!-- tabset:my-tabs -->
      
      The previous tabset will be ended before this comment.
      

You can nest tabsets in other tabsets if you want, e.g.,

<div class="tabset"></div>

- Tab one

  Content

- Tab two

  Content

  <div class="tabset"></div>

    - Child tab one

      Content

    - Child tab two

      Content

- Tab three

I hope you can find this simple tabset implementation useful (it is not tied to blogdown or Hugo). Please feel free to let me know if you have any suggestions or comments.

]]>
Three Useful Functions in Base R: `regexec()`, `strrep()`, and `append()` https://yihui.org/en/2023/10/three-functions/ Wed, 11 Oct 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/10/three-functions/ I just finished reviewing a pull request in the knitr repo that tries to improve the error message when it fails to parse YAML, and I feel three base R functions are worth mentioning to more R users. I have been inspired by Maëlle Salmon’s 3 functions blog series, and finally started writing one by myself.

regexec(): get substrings with patterns

If you want to master string processing using regular expressions (regex) with base R, the two help pages ?grep and ?regexp are pretty much all you need. Although I had read them many times in the past, I did not discover regexec() until about three years ago, while this function was first introduced in R 2.14.0 (2011-10-31).

This function gives you the positions of substring groups captured by your regular expressions. It will be much easier to understand if you actually get the substrings instead of their positions, which can be done via regmatches(), another indispensable function when you work with functions like regexec() and regexpr(). For example:

x = 'abbbbcdefg'
m = regexec('a(b+)', x)  # positions
regmatches(x, m)  # substrings
[[1]]
[1] "abbbb" "bbbb"

The length of the returned value depends on how many () groups you have in the regular expression. In the above example, the first value is the whole match (abbbb is matched by a(b+)), and the second value is for the first group (b+) (any number of consecutive b’s).

If you do not know regexec() or regmatches(), it is natural to do substr() like the aforementioned pull request originally did:

message = e$message
regex = "line (?<line>\\d+), column (?<column>\\d+)"
regex_result = regexpr(regex, message, perl = TRUE)
starts = attr(regex_result, "capture.start")
lengths = attr(regex_result, "capture.length")
line_index = substr(message, starts[,"line"], starts[,"line"] + lengths[,"line"] - 1)
column_index = substr(message, starts[,"column"], starts[,"column"] + lengths[,"column"] - 1)
line_index = as.integer(line_index)
column_index = as.integer(column_index)

Its goal is to extract a line and column number from a string of the form "line x, column y". I rewrote the code (using my obnoxious one-letter-variable-name style) as:

x = e$message
r = "line (?<row>\\d+), column (?<col>\\d+)"
m = regmatches(x, regexec(r, x, perl = TRUE))[[1]][-1]
row = as.integer(m['row'])
col = as.integer(m['col'])

Note that (<?NAME>...) means a named capture, so you could later extract the substrings by names instead of numeric indices, e.g., m['row'] instead of m[1]. But this is not important. It is okay to use a numeric index.

BTW, if you are new to regular expressions and not sure if you should use perl = TRUE or FALSE (often the default) in the regex family of functions, I’d recommend perl = TRUE. Perl-compatible regular expressions (PCRE) should cause you fewer surprises and are more powerful.

strrep(): repeat a string for a number of times

How many times have you done this?

paste(rep('abc', 10), collapse = '')

I have done it for numerous times. Now, no more rep() or paste(). Use strrep() instead:

strrep('abc', 10)

It is even vectorized like most other base R functions, e.g.,

strrep(c('abc', 'defg'), c(3, 4))

I do not want to pretend that I have always known everything—in fact, I did not discover this function until about two years ago.

It is common to generate N spaces like the original pull request did:

spaces = paste(rep(" ", column_index), collapse = "")
cursor = paste(spaces, "^~~~~~", collapse = "")

And I rewrote it as:

cursor = paste0(strrep(" ", col), "^~~~~~")

append(): insert elements to a vector

Maëlle has mentioned append() in her post. Interestingly, it could be used in this pull request, too. Original code:

split_indexes = seq_along(meta) <= line_index
before_cursor = meta[split_indexes]
after_cursor = meta[!split_indexes]
error_message = c(
  "Failed to parse YAML: ", e$message, "\n",
  before_cursor,
  cursor,
  after_cursor
)

New code:

x = c("Failed to parse YAML: ", x, "\n", append(meta, cursor, row))

I remember when I first learned S-Plus in 2004, I was surprised to see a classmate wrote a t.test() function by herself (which was actually cool) and she was equally surprised when I told her that there was a built-in t.test() function. I think similar things still happen today. If you are not aware of regexec(), strrep(), or append(), it is easy and tempting to reinvent them, which can make your code lengthy and complicated.

]]>
An Example of Simplifying a Decade-Old Piece of JavaScript https://yihui.org/en/2023/10/simplify-javascript/ Wed, 11 Oct 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/10/simplify-javascript/ Ten years ago, I wrote a piece of JS code to add a button to toggle the visibility of all R code blocks on a HTML page, which partially implemented the code-folding feature. The original code was this:

function toggle_R() {
  var x = document.getElementsByClassName('r');
  if (x.length == 0) return;
  function toggle_vis(o) {
    var d = o.style.display;
    o.style.display = (d == 'block' || d == '') ? 'none':'block';
  }

  for (i = 0; i < x.length; i++) {
    var y = x[i];
    if (y.tagName.toLowerCase() === 'pre') toggle_vis(y);
  }
}

document.write('<button type="button" onclick="toggle_R();" style="position: absolute; top: 0; right: 0;">Hide/Show Source</button>')

When I was thinking about code-folding again a few weeks ago, I dug out the above code, and felt it was worth commenting after a decade. The rewritten version of the above code can be found at the end of this post.

Bye, jQuery! Hi, vanilla JS!

As a JS amateur, my biggest change over these years was that I stopped using jQuery. Because I was an amateur, I thought jQuery was the “right” way to write JS. A lot of code that I saw was using the magical $(), which made me believe the only way to select elements on the page was $(). I was aware of methods like document.getElementById() and document.getElementByClassName(), but sometimes I wanted a more flexible method to select elements (not just by their IDs or class names). You can see in the above code that I first try to get all elements by the class name r:

x = document.getElementsByClassName('r');

and then check if the tag name of an element is pre:

for (i = 0; ...) {
  if (x[i].tagName.toLowerCase() === 'pre') ...
}

That was awkward.

It was quite a few years later that I discovered the document.querySelector() and document.querySelectorAll() methods in vanilla JS. At that moment, I could not believe it was that simple. You can use CSS selectors to select elements! Want to select <pre> with the class r?

document.querySelectorAll('pre.r');

That’s it. So straightforward.

In retrospect, I’d blame it on the Internet Explorer (IE) for the lack of support, especially IE6, which I had used for several years. Ironically, I remember how excited I was when I first saw IE6 came out: it looked pretty on the pretty Windows XP! Today I still think XP was pretty, but IE? IE wasted my life. Again, ironically I remember how excited I was when I figured out how to support file uploads for Shiny in IE8/9, but why did I have to spend time on that in the first place when all other web browsers did not require this special care?

Anyway, it is a relief that IE has pretty much died now.

To be fair, jQuery is a nice abstraction and has many merits. The problem is that they are rarely what I need now, so I’d rather not take this dependency (otherwise I will have to pay attention to its updates). Vanilla JS is often good enough for me. If you miss the terse dollar sign, I just learned yesterday (!) that you could create one by yourself, e.g.,

const $ = document.querySelectorAll.bind(document);
$('pre.r');

It is not equivalent to jQuery’s $, but can be a nice and useful shorthand anyway. Chrome has done something similar to this in the Developer Tools ($() is basically document.querySelector(), and $$() is document.querySelectorAll()).

BTW, it took me a few more years to realize that the querySelector() method could be used on any DOM elements, not just document. I have been this slow in learning.

Bye, for loops! Hi, .forEach()!

I was too used to writing for loops in JS, and there had been a pattern like this:

for (i = 0; i < x.length; i++) {
  y = x[i];
  // more work on y
}

There is no need to use a loop or create y (or the looping index i) when x is an array and you want to deal with its elements one by one. The forEach() method is much cleaner.

x.forEach(y => {
  
});

When x is not an array but an array-like object such as the returned value of getElementsByTagName(), you can convert it to an array first using the ... operator, e.g.,

[...document.getElementsByTagName('pre')].forEach(() => {});

If you read the code in the beginning of this post carefully, you may notice that I actually created a global variable i inadvertently: I should have used for (var i = 0) instead of for (i = 0). Using the forEach() method can avoid this problem.

To be clear, I still use for loops nowadays, but just not that often. Loops have their advantages, e.g., you can break a loop early when necessary.

Bye, function() {}! Hi, () => {}!

I use => to create functions now because it is shorter and still easy enough for me to read. This is purely a cosmetic preference.

I still use the function keyword when the function is meant to be called in several places. I prefer the => shorthand when I want to create a function to be used once somewhere (e.g., an anonymous function to be used as an event handler).

Bye, document.write()! Hi, .insertAdjacentHTML()!

I do not know why document.write() was prevalent when I started learning JS. Perhaps I learned through some fake tutorials. To add some HTML code to the DOM, I’d use the insertAdjacentHTML() method now. One problem with document.write() is that if you run it in the JS console of the browser, it can overwrite the full HTML code of the page, and you certainly do not wish to destroy the whole page.

Normally I’d avoid inserting raw HTML code, but sometimes I’m just too lazy to document.createElement() and append it with the .before() or .after() methods.

No global variables if possible

Creating global variables means potential clashes and pollution of the global namespace. In my original code, I create a global function toggle_R(), because I wanted to call it in the onclick event of the button. Now I’d not write the event handler to the onclick attribute of the <button>, but select the button in JS and attach the event to it instead. The latter way will not create a global variable.

Simplifying a conditional expression

Toggling the visibility of an element can be achieved by changing its display property in CSS. For a code block, display: none means to hide it, and display: block means to show it. Originally my code was like this:

el.style.display = (el.style.display === 'block' || el.style.display === '') ? 'none' : 'block';

Now I’d write:

el.style.display = (el.style.display === 'none') ? 'block' : 'none';

They are equivalent based on the assumption that display == '' means the block is shown, which is not strictly true but true most of time. If we compare the display value against none first, we can get shorter and simpler code than when we compare display against block or an empty string.

From 15 lines of code to 8: shorter, safer, and easier to reason about

Below is the “modernized” version of the code:

(d => {
  d.insertAdjacentHTML('beforeend','<button style="position:absolute;top:0;right:0;z-index:2;">Toggle Source</button>');
  d.lastElementChild.onclick = () => {
    d.querySelectorAll('pre.r').forEach(el => {
      el.style.display = (el.style.display === 'none') ? 'block' : 'none';
    });
  };
})(document.body);

The construct (d => {})(document.body) may feel like ninja code, but it is just because I’m lazy to write:

(() => {
  const d = document.body;
})();

We added the button the last of the document, so d.lastElementChild is the button, and we attached a click event to it. If you run the code, you should see a button at the top-right of the page. If you click on it, it should toggle R code blocks on the page (if there are any).

I guess JS experts may find it funny that I used to think I could not live without jQuery, and my discovery of querySelectorAll() changed my way to use and write JavaScript. I find it funny, too. In programming, abstraction libraries are often nice, but sometimes if we look closer, we may find that all we need is actually a tiny core feature that has existed for a long time in the base, and for some reason, we have missed it for years.

]]>
Two Hidden Ways to Set Global Chunk Options for knitr https://yihui.org/en/2023/10/opts-chunk/ Mon, 09 Oct 2023 00:00:00 +0000 Yihui Xie https://yihui.org/en/2023/10/opts-chunk/ Sometimes you may not like the default values of knitr’s chunk options, and you know how to change them (i.e., knitr::opts_chunk$set()), but it can be tedious to do it for every single document. How can we change default chunk options globally in a system?

One approach is to set these options in your .Rprofile, if you know what this file is and how to edit it. However, this approach has a drawback—it will load knitr for any R session, even if you do not need to use knitr in a certain session. To avoid that, you could set a package hook for knitr in .Rprofile, e.g.,

setHook(packageEvent('knitr', 'onLoad'), function(...) {
  knitr::opts_chunk$set(message = FALSE, warning = FALSE)
})

With this hook, knitr will not be loaded immediately when R starts up. The change of chunk options will occur only when knitr is being loaded (e.g., when you compile a document).

If you find the above approach too technical to understand, there are two other ways to modify chunk options globally:

  1. You can set global R options of the form options(knitr.chunk.NAME = VALUE) in .Rprofile, where NAME is the chunk option name, and VALUE is the desired value. For example,

    options(knitr.chunk.message = FALSE, knitr.chunk.warning = FALSE)
    

    These options will be recognized by knitr, which will run opts_chunk$set() for you internally.

  2. Another way is for people who prefer environment variables over .Rprofile. You can set the environment variable R_KNITR_OPTIONS to a comma-separated value like this:

    R_KNITR_OPTIONS="knitr.chunk.message=FALSE,knitr.chunk.warning=FALSE"
    

    Basically this value will be passed to options() and go to the previous way.

These two methods have existed for years but never been documented, because I was not sure if they could be useful to anyone else (I use them occasionally). Recently I seem to have seen a use case, so I’m writing them down here.

]]>