forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathExceptions-Debugging.rmd
498 lines (358 loc) · 18.1 KB
/
Exceptions-Debugging.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
---
title: Exceptions and debugging
layout: default
---
# Exceptions and debugging
<!-- http://www.noamross.net/blog/2013/4/18/r-debug-tools.html -->
This chapter describes techniques to use when things go wrong:
* Debugging: figuring out what went wrong.
* Exceptions: the set of objects that underlies error handling in R.
* Defensive programming: writing
As with many other parts of R, the approach to dealing with errors and exceptions comes from a LISP-heritage, and is quite different (although some of the terminology is the same) from that of languages like Java.
## Debugging
This section discusses how to debug from the command-line. Modern R guis, like Rstudio, also provide built in debugging tools. These are built on top of the tools I'll describe below but may be exposed in a more user friendly way that requires less typing.
### Traceback
The key function for performing a post-mortem on an error is `traceback`, which shows all the calls leading up to the error. Here's an example:
```{r, eval = TRUE}
f <- function() g()
g <- function() h()
h <- function() i()
i <- function() "a" + 1
f()
# Error in "a" + 1 : non-numeric argument to binary operator
traceback()
# 4: i() at #1
# 3: h() at #1
# 2: g() at #1
# 1: f()
```
This is very helpful to determine exactly where in a stack of calls an error occured. However, it's not so helpful if you have a recursive function, or other situations where the same function is called in multiple places:
```{r, eval = FALSE}
j <- function(i = 5) {
if (i == 1) "a" + 1
j(i - 1)
}
j()
# Error in "a" + 1 : non-numeric argument to binary operator
traceback()
# 5: j(i - 1) at #3
# 4: j(i - 1) at #3
# 3: j(i - 1) at #3
# 2: j(i - 1) at #3
# 1: j()
```
### Browser
Trackback can help you figure out where the error occurred, but to understand why the error occured and to fix it, it's often easier to explore interactively. `browser()` allows you to do this by pausing execution and returning you to an interactive state. Here you can run any regular R command, as well as some extra single letter commands:
* `c`: leave interactive debugging and continue execution
* `n`: execute the next step. Be careful if you have a variable named `n`: to
print it you'll need to be explicit `print(n)`.
* `\n`: the default behaviour is the same as `c`, but this is somewhat
dangerous as it makes it very easy to accidentally continue during
debugging. I recommend `options(browserNLdisabled = TRUE)` so that a new
line is simply ignored.
* `Q`: stops debugging, terminate the function and return to the global
workspace
* `where`: prints stack trace of active calls (the interactive equivalent of
`traceback`)
Don't forget that you can combine `if` statements with `browser()` to only debug when a certain situation occurs.
### Browsing arbitrary R code
As well as adding `browser()` yourself, there are two functions that will added it to code:
* `debug()` inserts a browser statement in the first line of the specified
function. `undebug` will remove it, or you can use `debugonce` to insert a
browser call for the next run, and have it automatically removed afterwards.
* `utils::setBreakpoint()` does the same thing, but instead inserts `browser()`
in the function corresponding to the specified file name and line number.
These two functions are both special cases of `trace()`, which allows you to insert arbitrary code in any position in an existing function. The complement of `trace()` is `untrace()`. You can only perform one trace per function - subsequent traces will replace prior.
Locating warnings is a little trickier. The easiest way to turn it in an error with `options(warn = 2)` and then use the standard functions described above. Turn back to default behaviour with `options(warn = 0)`.
### Browsing on error
It's also possible to start `browser` automatically when an error occurs, by setting `options(error = browser)`. This will start the interactive debugger in the environment in which the error occurred. Other functions that you can supply to `error` are:
* `recover`: a step up from `browser`, as it allows you to drill down into any
of the calls in the call stack. This is useful because often the cause of
the error is a number of calls back - you're just seeing the consequences.
This is the result of "fail-slow" code
* `dump.frames`: an equivalent to `recover` for non-interactive code. Will
save an `rdata` file containing the nested environments where the error
occurred. This allows you to later use `debugger` to re-create the error as
if you had called `recover` from where the error occurred
```{r, eval = FALSE}
# Saves debugging info to file last.dump.rda
options(error = quote({dump.frames(to.file = TRUE); q()}))
# Then in an interactive R session:
print(load("last.dump.rda"))
debugger("last.dump")
```
* `NULL`: the default. Prints an error message and stops function execution.
Use this to reset back to the regular behaviour.
Warnings are harder to track down because they don't provide any information about where they occured. Currently, the best way to debug them into turn them into errors using `options(warn = 2)`: then you can apply any of the techniques described above.
## Exceptions
The fine details of exceptions are not particularly well documented in R. If you want to learn more about the internals, I recommend the following two primary sources:
* [A prototype of a condition system for R](http://homepage.stat.uiowa.edu/~luke/R/exceptions/simpcond.html) by Robert Gentleman and Luke Tierney. This is describes an early version of R's condition system. The implementation changed somewhat since this was written, but it provides a good overview of how the pieces fit together, and some motivation for the design.
* [Beyond Exception Handling: Conditions and Restarts](http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html) by Peter Seibel. This describes exception handling in LISP, but the ideas are basically
the same in R, and it provides some more complicated use cases.
### Creation
You create errors in R with `stop()`.
### Basic error handling
Error handling is performed with the `try()` (simple) and `tryCatch()` (complex) functions. `try()` allows execution to continue even after an exception has occured. For example, normally if you run a function that throws an error, it terminates immediately and doesn't return a value:
```{r}
f1 <- function(x) {
log(x)
10
}
f1("x")
```
However, if you wrap the statement that creates the error in `try()`, the error message will still be printed but execution will continue:
```{r}
f2 <- function(x) {
try(log(x))
10
}
f2()
```
Note that you pass larger blocks of code to `try()` by wrapping them in `{}`:
```{r}
try({
a <- 1
b <- "x"
a + b
})
a
b
```
You can also capture the output of the `try()` function. If succesful, it will be the last result evaluated in the block (just like a function); if unsuccesful it will be an (invisible) object of class "try-error":
```{r}
success <- try(1 + 2)
failure <- try("a" + "b")
str(success)
str(failure)
```
You can use the second argument to `try()`, `silent`, to suppress the printing of the error message.
`try()` is particularly useful when you're applying a function to multiple elements in a list:
```{r}
elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
results <- lapply(elements, function(x) try(log(x)))
```
There isn't a built-in function for testing for this class, so we'll define one. Then you can easily find the locations of errors with `sapply()` (learn more about it in the functionals chapter), and extract the successes or look at the inputs that lead to failures.
```{r}
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)
# look at successful results
str(results[succeeded])
# look at inputs that failed
str(elements[!succeeded])
```
Another useful `try()` idiom is setting a default value if an expression fails. Simply assign the default value outside the try block, and then run the risky code:
```{r, eval = FALSE}
default <- NULL
try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE)
```
### Advanced error handling
`tryCatch` gives more control than `try`, but to understand how it works, we first need to learn a little about conditions, the S3 objects that represent errors, warnings and messages.
```{r}
is.condition <- function(x) inherits(x, "condition")
```
There are three convenience methods for creating errors, warnings and messages. All take two arguments: the `message` to display, and an optional `call` indicating where the condition was created
```{r}
e <- simpleError("My error", quote(f(x = 71)))
w <- simpleWarning("My warning")
m <- simpleMessage("My message")
```
There is one class of conditions that can't be directly: interrupts, which occur when the user presses Ctrl + Break, Escape, or Ctrl + C (depending on the platform) to terminate execution.
The components of a condition can be extracted with `conditionMessage` and `conditionCall`:
```{r}
conditionMessage(e)
conditionCall(e)
```
Conditions can be signalled using `signalCondition`. By default, no one is listening, so this doesn't do anything.
```{r}
signalCondition(e)
signalCondition(w)
signalCondition(m)
```
To listen to signals, we have two tools: `tryCatch()` and `withCallingHandlers()`. `tryCatch()` is an exiting handler: it catches the condition, but the rest of the code after the exception is not run. `withCallingHandlers()` sets up calling handlers: it catches the condition, and then resumes execution of the code. We will focus first on `tryCatch()`.
The `tryCatch()` call has three arguments:
* `expr`: the code to run.
* `...`: a set of named arguments setting up error handlers. If an error
occurs, `tryCatch` will call the first handler whose name matches one of the
classes of the condition. The only useful names for built-in conditions are
`interrupt`, `error`, `warning` and `message`.
* `finally`: code to run regardless of whether `expr` succeeds or fails. This
is useful for clean up, as described below. All handlers have been turned
off by the time the `finally` code is run, so errors will propagate as
usual.
The following examples illustrate the basic properties of `tryCatch`:
```{r}
# Handlers are passed a single argument
tryCatch(stop("error"),
error = function(...) list(...)
)
# This argument is the signalled condition, so we'll call
# it c for short.
# If multiple handlers match, the first is used
tryCatch(stop("error"),
error = function(c) "a",
error = function(c) "b"
)
# If multiple signals are nested, the the most internal is used first.
tryCatch(
tryCatch(stop("error"), error = function(c) "a"),
error = function(c) "b"
)
# Uncaught signals propagate outwards.
tryCatch(
tryCatch(stop("error")),
error = function(c) "b"
)
# The first handler that matches a class of the condition is used,
# not the "best" match:
a <- structure(list(message = "my error", call = quote(a)),
class = c("a", "error", "condition"))
tryCatch(stop(a),
error = function(c) "error",
a = function(c) "a"
)
tryCatch(stop(a),
a = function(c) "a",
error = function(c) "error"
)
# No matter what happens, finally is run:
tryCatch(stop("error"),
finally = print("Done."))
tryCatch(a <- 1,
finally = print("Done."))
# Any errors that occur in the finally block are handled normally
a <- 1
tryCatch(a <- 2,
finally = stop("Error!"))
```
What can handler functions do?
* Return a value.
* Pass the condition along, by re-signalling the error with `stop(c)`, or
`signalCondition(c)` for non-error conditions.
* Kill the function completely and return to the top-level with
`invokeRestart("abort")`
* Invoke another restart defined by `withRestarts()`.
We can write a simple version of `try` using `tryCatch`. The real version of `try` is considerably more complicated to preserve the usual error behaviour.
```{r}
try <- function(code, silent = FALSE) {
tryCatch(code, error = function(c) {
if (!silent) message("Error:", conditionMessage(c))
invisible(structure(conditionMessage(c), class = "try-error"))
})
}
try(1)
try(stop("Hi"))
try(stop("Hi"), silent = TRUE)
rm(try)
withCallingHandlers({
a <- 1
stop("Error")
a <- 2
}, error = function(c) {})
```
### Using `tryCatch`
With the basics in place, we'll next develop some useful tools based the ideas we just learned about.
The `finally` argument to `tryCatch` is particularly useful for clean up, because it is always called, regardless of whether the code executed successfully or not. This is useful when you have:
* modified `options`, `par` or locale
* opened connections, or created temporary files and directories
* opened graphics devices
* changed the working directory
* modified environment variables
The following function changes the working directory, executes some code, and always resets the working directory back to what it was before, even if the code raises an error.
```{r}
in_dir <- function(path, code) {
cur_dir <- getwd()
tryCatch({
setwd(path)
force(code)
}, finally = setwd(cur_dir))
}
getwd()
in_dir(R.home(), dir())
getwd()
in_dir(R.home(), stop("Error!"))
getwd()
```
Another more casual way of cleaning up is the `on.exit` function, which is called when the function terminates. It's not as fine grained as `tryCatch`, but it's a bit less typing.
```{r}
in_dir <- function(path, code) {
cur_dir <- getwd()
on.exit(setwd(cur_dir))
force(code)
}
```
If you're using multiple `on.exit` calls, make sure to set `add = TRUE`, otherwise they will replace the previous call. **Caution**: Unfortunately the default in `on.exit()` is `add = FALSE`, so that every time you run it, it overwrites existing exit expressions. Because of the way `on.exit()` is implemented, it's not possible to create a variant with `add = TRUE`, so you must be careful when using it.
### Exercises
1. Write a function that opens a graphics device, runs the supplied code, and closes the graphics device (always, regardless of whether or not the plotting code worked).
## Defensive programming
Defensive programming is the art of making code fail in a well-defined manner even when something unexpected occurs. There are two components of this art related to exceptions: raising exceptions as soon as you notice something has gone wrong, and responding to errors as cleanly as possible.
A general principle for errors is to "fail fast" - as soon as you figure out something as wrong, and your inputs are not as expected, you should raise an error. This is more work for you as the function author, but will make it easier for the user to debug because they get errors early on, not after unexpected input has passed through several functions and caused a problem.
There is a tension between interactive analysis and programming. When you a doing an analysis, you want R to do what you mean, and if it guesses wrong, then you'll discover it right away and can fix it. If you're creating a function, then you want to make it as robust as possible so that any problems become apparent right away (see fail fast below).
* Be explicit:
* Check the types of inputs
* Be explicit about missings:
* Avoid functions that have non-standard evaluation rules (i.e
`subset`, `with`, `transform`). These functions save you time when working
interactively, but when they fail inside a function they usually don't
return a useful error message.
* Avoid functions that can return different types of objects:
* Make sure you use preserving subsetting.
* Don't use `sapply()`: use `vapply()`, or `lapply()` plus the appropriate
transformation
### Creating
There are a number of options for letting the user know when something has gone wrong:
* don't use `cat()` or `print()`, except for print methods, or for optional
debugging information.
* use `message()` to inform the user about something expected - I often do
this when filling in important missing arguments that have a non-trivial
computation or impact. Two examples are `reshape2::melt` package, which
informs the user what melt and id variables were used if not specified, and
`plyr::join`, which informs which variables were used to join the two
tables. You can suppress messages with `suppressMessages`.
* use `warning()` for unexpected problems that aren't show stoppers.
`options(warn = 2)` will turn warnings into errors. Warnings are often
more appropriate for vectorised functions when a single value in the vector
is incorrect, e.g. `log(-1:2)` and `sqrt(-1:2)`. You can suppress warnings
with `suppressWarnings`
* use `stop()` when the problem is so big you can't continue
* `stopifnot()` is a quick and dirty way of checking that pre-conditions for
your function are met. The problem with `stopifnot` is that if they aren't
met, it will display the test code as an error, not a more informative
message. Checking pre-conditions with `stopifnot` is better than nothing,
but it's better still to check the condition yourself and return an
informative message with `stop()`
### An example
The following function is naively written and might cause problems:
```{r}
col_means <- function(df) {
numeric <- sapply(df, is.numeric)
numeric_cols <- df[, numeric]
data.frame(lapply(numeric_cols, mean))
}
```
The ability to come up with a set of potential pathological inputs is a good skill to master. Common cases that I try and check are:
* dimensions of length 0
* dimensions of length 1 (in case dropping occurs)
* incorrect input types
The following code exercises some of those cases for `col_means`
```{r}
col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = F])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
```
A better version of `col_means` might be:
```{r}
col_means <- function(df) {
numeric <- vapply(df, is.numeric, logical(1))
numeric_cols <- df[, numeric, drop = FALSE]
data.frame(lapply(numeric_cols, mean))
}
```
We use `vapply` instead of `sapply`, remember to use `drop = FALSE`. It still doesn't check that the input is correct, or coerce it to the correct format.