pkgpurl facilitates R package authoring using a literate programming approach. The main idea behind this is to write all of the R source code in R Markdown files (Rmd/*.Rmd
), which allows the actual code to be freely mixed with explanatory and supplementary information in expressive Markdown format. The main object of pkgpurl is to provide a standardized way to compile the bare .R
files from the prose-enhanced and thus more human-oriented .Rmd
files.
The basic idea behind the concept this package implements originates from Yihui Xie. See his blog post Write An R Package Using Literate Programming Techniques for more details, it’s definitively worth reading. This package’s function pkgpurl::purl_rmd()
is just a less cumbersome alternative to the Makefile approach outlined by him.
The R Markdown format provides several advantages over the bare R source format when developing an R package:
👍 Mix Markdown and Code
It allows the actual code to be freely mixed with explanatory and supplementary information in expressive Markdown format instead of having to rely on #
comments only. In general, this should encourage to actually record code-accompanying information because you’re able to use the full spectrum of Pandoc’s Markdown syntax like inline formatting, lists, tables, quotations or math1.
It is especially powerful in combination with the Visual R Markdown feature introduced in RStudio 1.4, which – in addition to the visual editor – offers a feature whose utility can hardly be overestimated: Pandoc Markdown canonicalization (on file save2). For example, it allows paragraphs being wrapped automatically at the desired line width; or to write a minimal sloppy pipe table that is automatically normalized to a beautifully formatted and actually readable one.
The relevant editor options which adjust the canonical Markdown generation can either be set
-
per
.Rmd
file, e.g.--- editor_options: markdown: wrap: 160 references: location: section canonical: true ---
-
or per project in the usual
PACKAGE_NAME.Rproj
file, e.g.MarkdownWrap: Column MarkdownWrapAtColumn: 160 MarkdownReferences: Section MarkdownCanonical: Yes
(I’d recommend to set them per project, so they apply to the whole package including any
.Rmd
vignettes.)
👍 All your code in a single, well-structured file
The traditional recommendation to not lose overview of your package’s R source code is to split it over multiple files. The popular (and very useful) book R Packages gives the following advice:
If it’s very hard to predict which file a function lives in, that suggests it’s time to separate your functions into more files or reconsider how you are naming your functions and/or files.
I think this is just ridiculous.
Instead, I encourage you to keep all your code (as far as possible) in a single file Rmd/PACKAGE_NAME.Rmd
and structure it according to the rules described here, which even allows the pkgdown Reference:
index to be automatically in sync with the source code structure. As a result, you re-organize (and thus most likely improve) your package’s code structure whenever you intend to improve the pkgdown reference – and vice versa. For a basic example, see this very package’s main source file.
Keeping all code in a single file frees you from the traditional hassle of finding a viable (but in the end still unsatisfactory) way to organize your R source code across multiple files. Of course, there are still good reasons to outsource code into separate files in certain situations, which nothing is stopping you from doing. You can also exclude whole .Rmd
files from purling using the .nopurl.Rmd
filename suffix.
👍 Improved overview and navigation
You can rely on RStudio’s code outline to easily navigate through longer .Rmd
files. IMHO it provides significantly better usability than the code section standard of .R
files. It makes it easy to find your way around source files that are thousands of lines long.
RStudio’s Go to File/Function shortcut works the same for .Rmd
files as it does for .R
files.
👍 Improved visual clarity
If you use RStudio or any other editor with proper R Markdown syntax highlighting, you will probably like the gained visual clarity for distinguishing individual functions/code parts (by putting them in separate R code chunks). This also facilitates creating a meaningful document structure (in Markdown) alongside the actual R source code.
👍 Easily toggle code inclusion
You can put development-only code which never lands in the generated R source files (and thus the R package) in separate code chunks with the chunk option purl = FALSE
. This turns out to be very convenient in certain situations.
For example, this is a good way to reproducibly document the generation of cleaned versions of exported data as well as internal data. This avoids having to outsource the code to separate files under data-raw/
and adding the directory to .Rbuildignore
, i.e. no need to use usethis::use_data_raw()
. Instead, you just set purl = FALSE
for the relevant code chunk(s). You can (and should) still use usethis::use_data()
(optionally with overwrite = TRUE
) to generate the files under data/
holding external package data as well as the R/sysdata.rda
file (using internal = TRUE
) holding internal package data.
👍 Easily toggle styler
If you use styler to auto-format your code globally by setting knitr::opts_chunk$set(tidy = "styler")
, you can still opt-out on a per-chunk basis by setting tidy = FALSE
. This gives pleasant flexibility.
Unfortunately, there are also a few notable drawbacks of the R Markdown format:
👎 Additional workflow step
The pkgpurl approach on writing R packages in the R Markdown format introduces one additional step at the very beginning of typical package development workflows: Running pkgpurl::purl_rmd()
to generate the R/*.gen.R
files from the original Rmd/*.Rmd
sources before documenting/checking/testing/building the package. Given sufficient user demand, this could probably be integrated into devtools’ functions in the future, so that no additional action has to be taken by the user when relying on RStudio’s built-in package building infrastructure.
For the time being, it’s recommended to set up a custom shortcut3 for one or both of pkgpurl::purl_rmd()
and pkgpurl::process_pkg()
which are registered as RStudio add-ins.
👎 Differing setup
Setting up a new project to write an R package in the R Markdown differs slightly from the classic approach. A suitable convenience function like create_rmd_package()
to set up all the necessary parts could probably be added to usethis in the future.
For the time being, you can use my ready-to-go R Markdown Package Development Template as a starting point for creating new R packages in the R Markdown format.
👎 Unwieldy debugging
Debugging can be a bit more laborious since line numbers in warning and error messages always refer to the generated R/*.gen.R
file(s), not the underlying Rmd/*.Rmd
source code file(s). If need be, you first have to look up the line numbers in the R/*.gen.R
file(s) to understand which function / code parts cause the issue in order to know where to fix it in the Rmd/*.Rmd
source(s).
👎 Missing roxygen2 auto-completion
Other than in .R
files, RStudio currently doesn’t support auto-completion of roxygen2 tags in .Rmd
files and its Reflow Comment command doesn’t properly work on them. These are known issues which will hopefully be resolved in the near future.
The documentation of this package is found here.
To install the latest development version of pkgpurl, run the following in R:
if (!("remotes" %in% rownames(installed.packages()))) {
install.packages(pkgs = "remotes",
repos = "https://cloud.r-project.org/")
}
remotes::install_gitlab(repo = "rpkg.dev/pkgpurl")
Some of pkgpurl’s functionality is controlled via package-specific global configuration which can either be set via R options or environment variables (the former take precedence). This configuration includes:
::: table-wide
Description | R option | Environment variable | Default value |
---|---|---|---|
Whether or not to add a copyright notice at the beginning of the generated .R files as recommended by e.g. the GNU licenses. The notice consists of the name and description of the program and the word Copyright (C) followed by the release years and the name(s) of the copyright holder(s), or if not specified, the author(s). The year is always the current year. All the other information is extracted from the package’s DESCRIPTION file. |
pkgpurl.add_copyright_notice |
R_PKGPURL_ADD_COPYRIGHT_NOTICE |
TRUE |
Whether or not to add a license notice at the beginning of the generated .R files as recommended by e.g. the GNU licenses. The license is determined from the package’s DESCRIPTION file and currently only the AGPL-3.0-or-later license is supported. |
pkgpurl.add_license_notice |
R_PKGPURL_ADD_LICENSE_NOTICE |
TRUE |
Whether or not to overwrite pkgdown’s reference index in the configuration file _pkgdown.yml with an auto-generated one based on the main input file as described in pkgpurl::gen_pkgdown_ref() . |
pkgpurl.gen_pkgdown_ref |
R_PKGPURL_GEN_PKGDOWN_REF |
TRUE |
::: |
This package’s source code is written in the R Markdown file format to facilitate practices commonly referred to as literate programming. It allows the actual code to be freely mixed with explanatory and supplementary information in expressive Markdown format instead of having to rely on #
comments only.
All the .gen.R
suffixed R source code found under R/
is generated from the respective R Markdown counterparts under Rmd/
using pkgpurl::purl_rmd()
4. Always make changes only to the .Rmd
files – never the .R
files – and then run pkgpurl::purl_rmd()
to regenerate the R source files.
This package borrows a lot of the Tidyverse design philosophies. The R code adheres to the principles specified in the Tidyverse Design Guide wherever possible and is formatted according to the Tidyverse Style Guide (TSG) with the following exceptions:
-
Line width is limited to 160 characters, double the limit proposed by the TSG (80 characters is ridiculously little given today’s high-resolution wide screen monitors).
Furthermore, the preferred style for breaking long lines differs. Instead of wrapping directly after an expression’s opening bracket as suggested by the TSG, we prefer two fewer line breaks and indent subsequent lines within the expression by its opening bracket:
# TSG proposes this do_something_very_complicated( something = "that", requires = many, arguments = "some of which may be long" ) # we prefer this do_something_very_complicated(something = "that", requires = many, arguments = "some of which may be long")
This results in less vertical and more horizontal spread of the code and better readability in pipes.
-
Usage of magrittr’s compound assignment pipe-operator
%<>%
is desirable5. -
Usage of R’s right-hand assignment operator
->
is not allowed6. -
R source code is not split over several files as suggested by the TSG but instead is (as far as possible) kept in the single file
Rmd/pkgpurl.Rmd
which is well-structured thanks to its Markdown support.
As far as possible, these deviations from the TSG plus some additional restrictions are formally specified in pkgpurl::default_linters
, which is (by default) used in pkgpurl::lint_rmd()
, which in turn is the recommended way to lint this package.
Footnotes
-
Actually, you could write anything you like in any syntax outside of R code chunks as long as you don’t mind the file to be knittable (which it doesn’t have to be). ↩
-
It basically sends the (R) Markdown file on a “Pandoc round trip” on every file save. ↩
-
I personally recommend to use the shortcut Ctrl+Shift+V since it’s not occupied yet by any of the predefined RStudio shortcuts. ↩
-
The very idea to leverage the R Markdown format to author R packages was originally proposed by Yihui Xie. See his excellent blog post for his point of view on the advantages of literate programming techniques and some practical examples. Note that using
pkgpurl::purl_rmd()
is a less cumbersome alternative to the Makefile approach outlined by him. ↩ -
The TSG explicitly instructs to avoid this operator – presumably because it’s relatively unknown and therefore might be confused with the forward pipe operator
%>%
when skimming code only briefly. I don’t consider this to be an actual issue since there aren’t many sensible usage patterns of%>%
at the beginning of a pipe sequence inside a function – I can only think of creating side effects and relying on R’s implicit return of the last evaluated expression. Therefore – and because I really like the%<>%
operator – it’s usage is welcome. ↩ -
The TSG explicitly accepts
->
for assignments at the end of a pipe sequence while Google’s R Style Guide considers this bad practice because it “makes it harder to see in code where an object is defined”. I second the latter. ↩