Yihui Xie

Back To The DT Package After Two Years

Why I disappeared for so long...

Yihui Xie / 2018-01-19


As I maintain more and more R packages, I find it more and more difficult to look back on all my previous packages. The DT package is one of them. I think readers of my blog and those who are familiar with my work know that 2016 was my bookdown year, and 2017 was the blogdown year. Of course, life is never as simple as “(exclusively) one package a year”. During the two years, I have released other packages that I haven’t got time to write about yet (such as printr), and also taken over the maintenance of the rmarkdown package (which is not a trivial task – I have spent several weeks on cutting almost 10 pages of Github issues to 2 pages). I wish I could have more time for some other packages, such as animation and tufte, but I can only wish.

My OCD on Github issues

Since the beginning of 2018, I have been focusing on the DT package. I have left it behind for about two years. I have tons of unread Github issues in this repo. This is the third week I have been working on it, and currently I have cut the number of issues from about 130 to 38. I have a weird OCD: I don’t want the number of Github issues to exceed 25, which means I wish I could see all remaining issues on a single page. Multiple pages of Github issues in a repo makes me feel uncomfortable. I know it is an illusion, but it makes me feel there is a lot more work to do after I navigate to the second page of issues. Besides, I tend to forget issues not on the first page.

Again, please ask questions on StackOverflow instead of Github

I have said enough on this topic. Among the large amount of Github issues, a lot of them are just questions. In most cases, only the package author takes care of the Github issues, which means if you ask a question in a Github issue, you are pretty much counting on a single person to respond to you. I cannot be highly responsive to every single question. It is beyond my capabilities. That said, I’m still trying my best, even to answer questions after one or two years (example).

You can also consider https://community.rstudio.com if you feel uncomfortable with StackOverflow. Personally I watch StackOverflow questions more closely than the RStudio Community (due to lack of time), but the Community should be friendlier to beginners than StackOverflow.

I’d like to thank a few people who helped answer questions in the Github issues when I was away, including @carlganz, @shrektan, @XD-DENG, and @fbreitwieser. As I said in my previous post, nothing feels better than someone else answering questions for you.

What is so complicated about DT?

Why couldn’t I work on DT and bookdown or blogdown at the same time? I’m not very good at multitasking, and DT requires a lot of brain power. You may feel it is just a package for tables, and how complicated can tables be? The answer is, they can be extremely complicated. Every single feature may be simple enough, but it is the combination of features that makes things complicated. For example, features A, B, and C may be simple on their own, but you have to make sure the table still works when A and B are enabled simultaneously, or B and C, or A and C, or A, B, and C.

DT is based on the JavaScript library DataTables, which is complicated enough. The core features are okay, but the extensions… They are my biggest headache, but they do look useful, so users love using them, but are constantly surprised why they don’t work in certain cases. Again, it is due to the combinatorial effect. For example, the FixedColumns extension may work well on its own, but when the column filters are enabled, the filters may be positioned incorrectly due to the dark JS and CSS tricks. Similarly, the ColReorder extension works well in the client-side processing mode, but doesn’t work in the server-side mode.

In theory, when you have $N$ features, you will have to consider $2^N$ cases. I have to consider several other things in DT besides the underlying DataTables library. For example, I need to make sure that DT works well in these cases:

The last case is particularly challenging. For example, column filters, row/column/cell selection, and table editing should all work with both client-side and server-side processing. For client-side processing, everything is processed through JavaScript, but for server-side processing, I need to re-implement everything in R. The DataTables library excels at client-side processing, but its server-side processing has a few missing pieces that I have to make up by myself (e.g., the server side does not know whether a search is case-sensitive or not because DataTables does not send this information to the server in the POST request).

But it is still fun

Despite the pain mentioned above, I think DT is still fun in many ways. Often times I have to hack very deep into different R packages and JS libraries, which feels like “庖丁解牛” (a fable in the Chinese literature that describes a chef meticulously dissecting a head of cattle). I’m especially interested in how R and the web browser (JavaScript and web requests) communicate with each other. I think the session$registerDataObj() method is probably one of the most undervalued things in Shiny (I’m bragging here because I wrote this method). The idea is simple, but it can be very powerful. It plays a key role in the server-side processing mode in DT. On a similar note, I think httpuv is also one of the most undervalued packages in R. If you have some knowledge in web development, you should find enormous fun with this package. I’m not an expert in web development, but I wrote a relatively simple package, servr, based on httpuv. Needless to say, I had a lot of fun with it.

As I probably have said a few times before, I’m not capable of developing complicated and important infrastructure packages like DataTables in JavaScript, or httpuv in R, or Hugo in Go, but I love assembling these “nuts and bolts”, and I’m relatively good at it.

DT v0.3, and the future

I’ll prepare a CRAN release of DT soon, so I’ll appreciate it if you could help me test it. You can install it from Github rstudio/DT. To know all the changes, take a look at the release notes in NEWS.md. My personal favorites are the table editing, Shift + Click row selection (thanks to Xianying Tan), and the smart filtering.

I’ll also deeply appreciate it if anyone can help me with the remaining Github issues, no matter if it is answering questions, fixing bugs, or implementing new features. I’ll be happy to video chat with you if you need me to explain any source code in this package when you are stuck. I don’t want to leave this package unattended for two years again, but very soon I’ll have to move on to my 2018 project, and I don’t think I’ll be able to commit to it like I did in the past three weeks.

Always busy