forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
vignettes.Rmd
417 lines (283 loc) · 21.4 KB
/
vignettes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
# Vignettes: long-form documentation {#vignettes}
```{r, include = FALSE}
source("common.R")
```
A vignette is a long-form guide to your package. Function documentation is great if you know the name of the function you need, but it's useless otherwise. A vignette is like a book chapter or an academic paper: it can describe the problem that your package is designed to solve, and then show the reader how to solve it. A vignette should divide functions into useful categories, and demonstrate how to coordinate multiple functions to solve problems. Vignettes are also useful if you want to explain the details of your package. For example, if you have implemented a complex statistical algorithm, you might want to describe all the details in a vignette so that users of your package can understand what's going on under the hood, and be confident that you've implemented the algorithm correctly.
Many existing packages have vignettes. You can see all the installed vignettes with `browseVignettes()`. To see the vignette for a specific package, use the argument, `browseVignettes("packagename")`. Each vignette provides three things: the original source file, a readable HTML page or PDF, and a file of R code. You can read a specific vignette with `vignette(x)`, and see its code with `edit(vignette(x))`. To see vignettes for a package you haven't installed, look at its CRAN page, e.g., <http://cran.r-project.org/web/packages/dplyr>.
Before R 3.0.0, the only way to create a vignette was with Sweave. This was challenging because Sweave only worked with LaTeX, and LaTeX is both hard to learn and slow to compile. Now, any package can provide a vignette __engine__, a standard interface for turning input files into HTML or PDF vignettes. In this chapter, we're going to use the R markdown vignette engine provided by [knitr](http://yihui.name/knitr/). I recommend this engine because:
* You write in Markdown, a plain text formatting system. Markdown is limited
compared to LaTeX, but this limitation is good because it forces you to
focus on the content.
* It can intermingle text, code and results (both textual and visual).
* Your life is further simplified by the [rmarkdown package](http://rmarkdown.rstudio.com/),
which coordinates Markdown and knitr by using [pandoc](http://johnmacfarlane.net/pandoc)
to convert Markdown to HTML and by providing many useful templates.
Switching from Sweave to R Markdown had a profound impact on my use of vignettes. Previously, making a vignette was painful and slow and I rarely did it. Now, vignettes are an essential part of my packages. I use them whenever I need to explain a complex topic, or to show how to solve a problem with multiple steps.
Currently, the easiest way to get R Markdown is to use [RStudio](http://www.rstudio.com/products/rstudio/download/preview/). RStudio will automatically install all of the needed prerequisites. If you don't use RStudio, you'll need to:
1. Install the rmarkdown package with `install.packages("rmarkdown")`.
1. [Install pandoc](http://johnmacfarlane.net/pandoc/installing.html).
## Vignette workflow {#vignette-workflow}
To create your first vignette, run:
```{r, eval = FALSE}
usethis::use_vignette("my-vignette")
```
This will:
1. Create a `vignettes/` directory.
1. Add the necessary dependencies to `DESCRIPTION` (i.e. it adds knitr to
the `Suggests` and `VignetteBuilder` fields).
1. Draft a vignette, `vignettes/my-vignette.Rmd`.
The draft vignette has been designed to remind you of the important parts of an R Markdown file. It serves as a useful reference when you're creating a new vignette.
Once you have this file, the workflow is straightforward:
1. Modify the vignette.
2. Press Ctrl/Cmd + Shift + K (or click
`r knitr::include_graphics("images/knit.png")`) to knit the
vignette and preview the output.
There are three important components to an R Markdown vignette:
* The initial metadata block.
* Markdown for formatting text.
* Knitr for intermingling text, code and results.
These are described in the following sections.
## Metadata {#vignette-metadata}
The first few lines of the vignette contain important metadata. The default template contains the following information:
---
title: "Vignette Title"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
This metadata is written in [yaml](http://www.yaml.org/), a format designed to be both human and computer readable. The basics of the syntax is much like the `DESCRIPTION` file, where each line consists of a field name, a colon, then the value of the field. The one special YAML feature we're using here is `>`. It indicates the following lines of text are plain text and shouldn't use any special YAML features.
The fields are:
* Title, author and date: this is where you put the vignette's title, author and
date. You'll want to fill these in yourself (you can delete them
if you don't want the title block at the top of the page). The date is filled
in by default: it uses a special knitr syntax (explained below) to insert
today's date.
* Output: this tells rmarkdown which output formatter to use.
There are many options that are useful for regular reports (including
html, pdf, slideshows, ...) but `rmarkdown::html_vignette` has been
specifically designed to work well inside packages. See
`?rmarkdown::html_vignette` for more details.
* Vignette: this contains a special block of metadata needed by R. Here, you can
see the legacy of LaTeX vignettes: the metadata looks like LaTeX commands.
You'll need to modify the `\VignetteIndexEntry` to provide the title of your
vignette as you'd like it to appear in the vignette index. Leave the other
two lines as is. They tell R to use `knitr` to process the file, and that the
file is encoded in UTF-8 (the only encoding you should ever use to write
vignettes).
## Markdown {#markdown}
R Markdown vignettes are written in Markdown, a light weight markup language. John Gruber, the author of Markdown, summarises the goals and philosophy of Markdown:
> Markdown is intended to be as easy-to-read and easy-to-write as is feasible.
>
> Readability, however, is emphasized above all else. A Markdown-formatted
> document should be publishable as-is, as plain text, without looking like
> it’s been marked up with tags or formatting instructions. While Markdown’s
> syntax has been influenced by several existing text-to-HTML filters —
> including Setext, atx, Textile, reStructuredText, Grutatext, and EtText —
> the single biggest source of inspiration for Markdown’s syntax is the format
> of plain text email.
>
> To this end, Markdown’s syntax is comprised entirely of punctuation
> characters, which punctuation characters have been carefully chosen so as
> to look like what they mean. E.g., asterisks around a word actually look
> like *emphasis*. Markdown lists look like, well, lists. Even blockquotes
> look like quoted passages of text, assuming you’ve ever used email.
Markdown isn't as powerful as LaTeX, reStructuredText or docbook, but it's simple, easy to write, and easy to read even when it's not rendered. I find Markdown's constraints helpful for writing because it lets me focus on the content, and prevents me from messing around with the styling.
If you've never used Markdown before, a good place to start is John Gruber's [Markdown syntax documentation](http://daringfireball.net/projects/markdown/syntax). Pandoc's implementation of Markdown rounds off some of the rough edges and adds a number of new features, so I also recommend familiarising yourself with the [pandoc readme](http://johnmacfarlane.net/pandoc/README.html). When editing a Markdown document, RStudio presents a drop-down menu via the question mark icon, which offers a Markdown reference card.
The sections below show you what I think are the most important features of pandoc's Markdown dialect. You should be able to learn the basics in under 15 minutes.
### Sections
Headings are identified by `#`:
# Heading 1
## Heading 2
### Heading 3
Create a horizontal rule with three or more hyphens (or asterisks):
--------
********
### Lists
Basic unordered lists use `*`:
* Bulleted list
* Item 2
* Nested bullets need a 4-space indent.
* Item 2b
If you want multiparagraph lists, the second and subsequent paragraphs need additional indenting:
* It's possible to put multiple paragraphs of text in a list item.
But to do that, the second and subsequent paragraphs must be
indented by four or more spaces. It looks better if the first
bullet is also indented.
Ordered lists use: `1.`:
1. Item 1
1. Item 2
1. Items are numbered automatically, even though they all start with 1.
You can intermingle ordered and bulleted lists, as long as you adhere to the four space rule:
1. Item 1.
* Item a
* Item b
1. Item 2.
Definition lists use ` : `
Definition
: a statement of the exact meaning of a word, especially in a dictionary.
List
: a number of connected items or names written or printed consecutively,
typically one below the other.
: barriers enclosing an area for a jousting tournament.
### Inline formatting
Inline format is similarly simple:
_italic_ or *italic*
__bold__ or **bold**
[link text](destination)
<http://this-is-a-raw-url.com>
### Tables
There are [four types of tables](http://johnmacfarlane.net/pandoc/README.html#tables). I recommend using the pipe table which looks like this:
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
Notice the use of the `:` in the spacer under the heading. This determines the alignment of the column.
If the data underlying your table exists in R, don't lay it out by hand. Instead use `knitr::kable()`, or look at [printr](https://github.com/yihui/printr) or [pander](http://rapporter.github.io/pander/).
### Code
For inline code use `` `code` ``.
For bigger blocks of code, use ```` ``` ````. These are known as "fenced" code blocks:
```
# A comment
add <- function(a, b) a + b
```
To add syntax highlighting to the code, put the language name after the backtick:
```c
int add(int a, int b) {
return a + b;
}
```
(At time of printing, languages supported by pandoc were: actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c, changelog, clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d, diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang, fortran, fsharp, gnuassembler, go, haskell, haxe, html, ini, java, javadoc, javascript, json, jsp, julia, latex, lex, literatecurry, literatehaskell, lua, makefile, mandoc, matlab, maxima, metafont, mips, modula2, modula3, monobasic, nasm, noweb, objectivec, objectivecpp, ocaml, octave, pascal, perl, php, pike, postscript, prolog, python, r, relaxngcompact, rhtml, ruby, rust, scala, scheme, sci, sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, texinfo, verilog, vhdl, xml, xorg, xslt, xul, yacc, yaml. Syntax highlighting is done by the haskell package [highlighting-kate](http://johnmacfarlane.net/highlighting-kate); see the website for current list.)
When you include R code in your vignette, you usually won't use ```` ```r ````. Instead, you'll use ```` ```{r} ````, which is specially processed by knitr, as described next.
## Knitr {#knitr}
Knitr allows you to intermingle code, results and text. Knitr takes R code, runs it, captures the output, and translates it into formatted Markdown. Knitr captures all printed output, messages, warnings, errors (optionally) and plots (basic graphics, lattice & ggplot and more).
Consider the simple example below. Note that a knitr block looks similar to a fenced code block, but instead of using `r`, you are using `{r}`.
```{r}
# Add two numbers together
add <- function(a, b) a + b
add(10, 20)
```
This generates the following Markdown:
```r
# Add two numbers together
add <- function(a, b) a + b
add(10, 20)
## [1] 30
```
Which, in turn, is rendered as:
```r
# Add two numbers together
add <- function(a, b) a + b
add(10, 20)
## 30
```
Once you start using knitr, you'll never look back. Because your code is always run when you build the vignette, you can rest assured knowing that all your code works. There's no way for your input and output to be out of sync.
### Options
You can specify additional options to control the rendering:
* To affect a single block, add the block settings:
```{r, opt1 = val1, opt2 = val2}
# code
```
* To affect all blocks, call `knitr::opts_chunk$set()` in a knitr block:
```{r, echo = FALSE}
knitr::opts_chunk$set(
opt1 = val1,
opt2 = val2
)
```
The most important options are:
* `eval = FALSE` prevents evaluation of the code. This is useful if you want
to show some code that would take a long time to run. Be careful when you
use this: since the code is not run, it's easy to introduce bugs.
(Also, your users will be puzzled when they copy & paste code and it doesn't
work.)
* `echo = FALSE` turns off the printing of the code _input_ (the output
will still be printed). Generally, you shouldn't use this in vignettes
because understanding what the code is doing is important. It's more useful
when writing reports since the code is typically less important than the
output.
* `results = "hide"` turns off the printing of code _output_.
* `warning = FALSE` and `message = FALSE` suppress the display of warnings
and messages.
* `error = TRUE` captures any errors in the block and shows them inline.
This is useful if you want to demonstrate what happens if code throws an error.
Whenever you use `error = TRUE`, you also need to use `purl = FALSE`. This
is because every vignette is accompanied by a file code that contains all the
code from the vignette. R must be able to source that file without errors,
and `purl = FALSE` prevents the code from being inserted into that document.
* `collapse = TRUE` and `comment = "#>"` are my preferred way of displaying
code output. I usually set these globally by putting the following knitr
block at the start of my document.
```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```
* `results = "asis"` treats the output of your R code as literal Markdown.
This is useful if you want to generate text from your R code. For example,
if you want to generate a table using the pander package, you'd do:
```{r, results = "asis"}
pander::pandoc.table(iris[1:3, 1:4])
```
That generates a Markdown table that looks like:
--------------------------------------------------------
Sepal.Length Sepal.Width Petal.Length Petal.Width
-------------- ------------- -------------- -------------
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
4.7 3.2 1.3 0.2
---------------------------------------------------------
Which makes a table that looks like:
--------------------------------------------------------
Sepal.Length Sepal.Width Petal.Length Petal.Width
-------------- ------------- -------------- -------------
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
4.7 3.2 1.3 0.2
---------------------------------------------------------
* `fig.show = "hold"` holds all figures until the end of the code block.
* `fig.width = 5` and `fig.height = 5` set the height and width of figures
(in inches).
Many other options are described at <http://yihui.name/knitr/options>.
## Development cycle {#vignette-workflow-2}
Run code a chunk at a time using Cmd + Alt + C. Re-run the entire document in a fresh R session using Knit (Ctrl/Cmd + Shift + K).
You can build all vignettes from the console with `devtools::build_vignettes()`, but this is rarely useful. Instead use `devtools::build()` to create a package bundle with the vignettes included. RStudio's "Build & reload" does not build vignettes to save time. Similarly, `devtools::install_github()` (and friends) will not build vignettes by default because they're time consuming and may require additional packages. You can force building with `devtools::install_github(build_vignettes = TRUE)`. This will also install all suggested packages.
## Advice for writing vignettes {#vignette-advice}
> If you're thinking without writing, you only think you're thinking.
> --- Leslie Lamport
When writing a vignette, you're teaching someone how to use your package. You need to put yourself in the readers' shoes, and adopt a "beginner's mind". This can be difficult because it's hard to forget all of the knowledge that you've already internalised. For this reason, I find teaching in-person a really useful way to get feedback on my vignettes. Not only do you get that feedback straight away but it's also a much easier way to learn what people already know.
A useful side effect of this approach is that it helps you improve your code. It forces you to re-see the initial onboarding process and to appreciate the parts that are hard. Every time that I've written text that describes the initial experience, I've realised that I've missed some important functions. Adding those functions not only helps my users, but it often also helps me! (This is one of the reasons that I like writing books).
* I strongly recommend literally anything written by Kathy Sierra. Her old blog,
[Creating passionate users](http://headrush.typepad.com/) is full of advice
about programming, teaching, and how to create valuable tools. I thoroughly
recommend reading through all the older content. Her new blog,
[Serious Pony](http://seriouspony.com/blog/), doesn't have as much content,
but it has some great articles.
* If you'd like to learn how to write better, I highly recommend
[Style: Lessons in Clarity and Grace](http://amzn.com/0321898680) by
Joseph M. Williams and Joseph Bizup. It helps you understand the structure of
writing so that you'll be better able to recognise and fix bad writing.
Writing a vignette also makes a nice break from coding. In my experience, writing uses a different part of the brain from programming, so if you're sick of programming, try writing for a bit. (This is related to the idea of [structured procrastination](http://www.structuredprocrastination.com/).).
### Organisation
For simpler packages, one vignette is often sufficient. But for more complicated packages you may actually need more than one. In fact, you can have as many vignettes as you like. I tend to think of them like chapters of a book - they should be self-contained, but still link together into a cohesive whole.
Although it's a slight hack, you can link various vignettes by taking advantage of how files are stored on disk: to link to vignette `abc.Rmd`, just make a link to `abc.html`.
## CRAN notes {#vignette-cran}
Note that since you build vignettes locally, CRAN only receives the html/pdf and the source code. However, CRAN does not re-build the vignette. It only checks that the code is runnable (by running it). This means that any packages used by the vignette must be declared in the `DESCRIPTION`. But this also means that you can use Rmarkdown (which uses pandoc) even though CRAN doesn't have pandoc installed.
Common problems:
* The vignette builds interactively, but when checking, it fails with an error
about a missing package that you know is installed. This means that you've
forgotten to declare that dependency in the `DESCRIPTION` (usually it should
go in `Suggests`).
* Everything works interactively, but the vignette doesn't show up after
you've installed the package. One of the following may have occurred. First,
because RStudio's "build and reload" doesn't build vignettes, you may need
to run `devtools::install()` instead. Next check:
1. The directory is called `vignettes/` and not `vignette/`.
1. Check that you haven't inadvertently excluded the vignettes with
`.Rbuildignore`
1. Ensure you have the necessary vignette metadata.
* If you use `error = TRUE`, you must use `purl = FALSE`.
You'll need to watch the file size. If you include a lot of graphics, it's easy to create a very large file. There are no hard and fast rules, but if you have a very large vignette be prepared to either justify the file size, or to make it smaller.
## Where next {#where-next}
If you'd like more control over the appearance of your vignette, you'll need to learn more about Rmarkdown. The website, <http://rmarkdown.rstudio.com>, is a great place to start. There you can learn about alternative output formats (like LaTeX and pdf) and how you can incorporate raw HTML and LaTeX if you need additional control.
If you write a nice vignette, consider submitting it to the _Journal of Statistical Software_ or _The R Journal_. Both journals are electronic only and peer-reviewed. Comments from reviewers can be very helpful for improving the quality of your vignette and the related software.