forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
R6.Rmd
530 lines (392 loc) · 18.9 KB
/
R6.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
# R6
```{r setup, include = FALSE}
source("common.R")
library(R6)
```
This chapter describes the R6 object system. Unlike S3 and S4, it provides encapsulated OO, which means that:
* R6 methods belong to objects, not generics.
* R6 objects are mutable: the usual copy-on-modify semantics do not apply.
These properties make R6 objects behave more like objects in programming languages such as Python, Ruby and Java. This does not mean that R6 is good, and S3 and S4 are bad, it just means that R has a different heritage than most modern mainstream programming languages.
R6 is very similar to a built-in OO system called __reference classes__, or RC for short. I'm going to teach you R6 instead of RC for four reasons:
* R6 is much simpler. Both R6 and RC are built on top of environments, but
while R6 uses S3, RC uses S4. R6 is only ~500 lines of R code (and ~1700
lines of tests!). We're not going to discuss the implementation in depth
here, but if you've mastered the contents of this book, you should be able
to read the source code and figure out how it works.
* RC mingles variables and fields in the same stack of environments so that you
get (`field`) and set fields (`field <<- value`) like regular values. R6 puts
fields in a separate environment so you get (`self$field`) and set
(`self$field <- value`) with a prefix. The R6 approach is more verbose but is
worth the tradeoff because it makes code easier to understand. It also makes
inheritance across packages simpler and more robust.
* R6 is much faster than RC. Generally, the speed of method dispatch is not
important outside of microbenchmarks but R6 is substantially better than
RC. Switching from RC to R6 yielded substantial performance in shiny.
`vignette("Performance", "R6")` provides more details on the performance.
* Because the ideas that underlie R6 and RC are similar, it will only require
a small amount of additional effort to learn RC if you need to.
Because R6 is not built into base R, you'll need to install and load a package in order to use it:
```{r}
library(R6)
```
If you'd like to learn more about R6 after reading this chapter, the best place to start is the vignettes included in the package. You can list them by calling `browseVignettes(package = "R6")`.
## Classes and methods
R6 only needs a single function call to create both the class and its methods: `R6::R6Class()`. And this is the only function from the package that you'll ever use! The following example shows the two most important arguments:
* The first argument is the `classname`. It's not strictly needed, but it
improves error messages and makes it possible to also use R6 objects
with S3 generics. By convention, R6 classes use UpperCamelCase.
* The second argument, `public`, supplies a list of methods (functions) and
fields (anything else) that make up the public interface of the object.
By convention, methods and fields use snake_case. Methods can access
the methods and fields of the current object via `self$`.
```{r}
Accumulator <- R6Class("Accumulator", list(
sum = 0,
add = function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
)
```
You should always assign the result of `R6Class()` into a variable with the same name as the class. This creates an R6 object that defines the R6 class:
```{r}
Accumulator
```
You construct a new object from the class by calling the `new()` method. Methods belong to R6 objects so you use `$` to access `new()`:
```{r}
x <- Accumulator$new()
```
You can then call methods and access fields with `$`:
```{r}
x$add(4)
x$sum
```
In this class, the fields and methods are public which means that you can get or set the value of any field. Later, we'll see how to use private fields and methods to prevent casual access to the internals of your class.
To make it clear when we're talking about fields and methods as opposed to variables and functions, when referring to them in text, we'll prefix with `$`. For example, the `Accumulate` class has field `$sum` and method `$add()`.
### Method chaining
`$add()` is called primarily for its side-effect of updating `$sum`.
```{r}
Accumulator <- R6Class("Accumulator", list(
sum = 0,
add = function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
)
```
Side-effect R6 methods should always return `self` invisibly. This returns the "current" object and makes it possible to chain together multiple method calls:
```{r}
x$add(10)$add(10)$sum
```
Alternatively, for long chains, you can spread the call over multiple lines:
```{r}
x$
add(10)$
add(10)$
sum
```
This technique is called __method chaining__ and is commonly used in encapsulated OO languages (like Python and JavaScript) to create fluent interfaces. Method chaining is deeply related to the pipe, and we'll discuss the pros and cons of each approach in [pipe vs message-chaining tradeoffs](#tradeoffs-pipe).
### Important methods
There are two important methods that will be defined for most classes: `$initialize()` and `$print()`. You don't have to provide them, but it's a good idea to do so because they will make your class easier to use.
`$initialize()` overrides the default behaviour of `$new()`. For example, the following code defines an R6 Person class, similar to the S4 equivalent in [S4]. Unlike S4, R6 provides no checks for object type by default. `$initialize()` is a good place to check that `name` and `age` are the correct types.
```{r}
Person <- R6Class("Person", list(
name = NULL,
age = NA,
initialize = function(name, age = NA) {
stopifnot(is.character(name), length(name) == 1)
stopifnot(is.numeric(age), length(age) == 1)
self$name <- name
self$age <- age
}
))
hadley <- Person$new("Hadley", age = 37)
```
If you have more expensive validation requirements, implement them in a separate `$validate()` and only call when needed.
Defining `$print()` allows you to override the default printing behaviour. As with any R6 method called for its side effects, `$print()` should return `invisible(self)`.
```{r}
Person <- R6Class("Person", list(
name = NULL,
age = NA,
initialize = function(name, age = NA) {
self$name <- name
self$age <- age
},
print = function(...) {
cat("Person: \n")
cat(" Name: ", self$name, "\n", sep = "")
cat(" Age: ", self$age, "\n", sep = "")
invisible(self)
}
))
hadley2 <- Person$new("Hadley")
hadley2
```
This code illustrates an important aspect of R6. Because methods are bound to individual objects, the previously created `hadley` does not get this new method:
```{r}
hadley
```
Indeed, from the perspective of R6, there is no relationship between `hadley` and `hadley2`. This can make interactive experimentation with R6 confusing. If you're changing the code and can't figure out why the results of method calls aren't changed, make sure you've re-constructed R6 objects with the new class.
There's a useful alternative to `$print()`: implement `$format()`, which should return a character vector. This will automatically be used by both `print()` and `format()` S3 generics.
```{r}
Person <- R6Class("Person", list(
age = NA,
name = NULL,
initialize = function(name, age = NA) {
self$name <- name
self$age <- age
},
format = function(...) {
# The first `paste0()` is not necessary but it lines up
# with the subsequent lines making it easier to see how
# it will print
c(
paste0("Person:"),
paste0(" Name: ", self$name),
paste0(" Age: ", self$age)
)
}
))
hadley3 <- Person$new("Hadley")
format(hadley3)
hadley3
```
### Adding methods after creation
Instead of continuously creating new classes, it's also possible to modify the methods of an existing class. This is useful when exploring interactively, and when you have a class with many functions that you'd like to break up into pieces.
Once the class has been defined, you can add elements to it with `$set()`, supplying the visibility (more on that below), the name, and the component.
```{r}
Accumulator <- R6Class("Accumulator")
Accumulator$set("public", "sum", 0)
Accumulator$set("public", "add", function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
```
`$set()` will not overwrite an existing method unless you explicitly ask for it:
```{r, error = TRUE}
Accumulator$set("public", "sum", 1)
Accumulator$set("public", "sum", 1, overwrite = TRUE)
```
Also note that adding methods will only affect new objects generated from the class. It does not retrospectively apply to existing objects:
```{r, error = TRUE}
x1 <- Accumulator$new()
Accumulator$set("public", "hello", function() message("Hi!"))
x1$hello()
x2 <- Accumulator$new()
x2$hello()
```
### Inheritance
To inherit behaviour from an existing class, provide the class object to the `inherit` argument:
```{r}
AccumulatorChatty <- R6Class("AccumulatorChatty",
inherit = Accumulator,
public = list(
add = function(x = 1) {
cat("Adding ", x, "\n", sep = "")
super$add(x = x)
}
)
)
x2 <- AccumulatorChatty$new()
x2$add(10)$add(1)$sum
```
Note that `$add()` overrides the implementation in the superclass, but we can access the previous implementation through `super$`. Any methods which are overridden will automatically call the implementation in the parent class.
Like S3, R6 only supports single inheritance: you cannot supply a vector of classes to inherit.
### Introspection
Every R6 object has an S3 class that reflects the hierarchy of R6 classes. This means that the easiest way to determine the class (and all classes it inherits from) is to use `class()`:
```{r}
class(hadley3)
```
The S3 hierarchy includes the base "R6" class. This provides common behaviour, including an `print.R6()` method which calls `$print()` or `$format()`, as described above.
You can list all methods and fields with `names()`:
```{r}
names(hadley3)
```
There's one method that we haven't defined: `$clone()`. It's provided by R6 and we'll come back to it in [reference semantics].
### Exercises
1. Can subclasses access private fields/methods from their parent? Perform
an experiment to find out.
## Controlling access
`R6Class()` has two other arguments that work similarly to `public`: `private` and `active`. `private` allows you to create components that the user can not easily access, and `active` allows you to use accessor functions to define dynamic, or active, fields.
### Privacy
With R6 you can define __private__ fields and methods, elements that can only be accessed from within the class, not from the outside. There are two things that you need to know to take advantage of private elements:
* The `private` argument works in the same way as the `public` argument:
you give it a named list of methods (functions) and fields (everything else).
* Fields and methods defined in `private` are available within the methods
with `private$` instead of `self$`. You cannot access private fields or
methods outside of the class.
To make this concrete, we could make `$age` and `$name` fields of the Person class private. With this definition of `Person` we can only set `$age` and `$name` during object creation, and we cannot access their values from outside of the class.
```{r}
Person <- R6Class("Person",
public = list(
initialize = function(name, age = NA) {
private$name <- name
private$age <- age
},
print = function(...) {
cat("Person: \n")
cat(" Name: ", private$name, "\n", sep = "")
cat(" Age: ", private$age, "\n", sep = "")
}
),
private = list(
age = NA,
name = NULL
)
)
hadley4 <- Person$new("Hadley")
hadley4$name
```
The distinction between public and private fields is important when you create complex networks of classes, and you want to make it as clear as possible what it's ok for others to access. Anything that's private can be more easily refactored because you know others aren't relying on it. Private methods tend to be more important in other programming languages compared to R because the object hierarchies in R tend to be simpler.
### Active fields
Active fields make allow you to define components that look like fields from the outside, but are defined with functions, like methods. For example, we can define an active field `x` that returns a different value every time you access it:
```{r}
Rando <- R6::R6Class("Rando", active = list(
random = function(value) {
runif(1)
}
))
x <- Rando$new()
x$random
x$random
x$random
```
Active fields are particularly useful in conjunction with privacy, because they make it possible to implement components that work like fields from the outside but provide additional checks. For example, you can use them to implement read-only fields or fields that validate their inputs.
Active fields are implemented using active bindings from base R. Each active binding is a function that takes a single argument: `value`. If the argument is `missing()`, the value is being retrieved; otherwise it's being modified. We can use that idea to make a read-only `age` field, and to ensure that `name` is a length 1 character vector.
```{r, error = TRUE}
Person <- R6Class("Person",
private = list(
.age = NA,
.name = NULL
),
active = list(
age = function(value) {
if (missing(value)) {
private$.age
} else {
stop("`$age` is read only", call. = FALSE)
}
},
name = function(value) {
if (missing(value)) {
private$.name
} else {
stopifnot(is.character(name), length(name) == 1)
private$.name <- value
self
}
}
),
public = list(
initialize = function(name, age = NA) {
private$.name <- name
private$.age <- age
}
)
)
hadley5 <- Person$new("Hadley")
hadley5$name
hadley5$name <- 10
hadley5$age
hadley5$age <- 20
```
### Exercises
1. How would you define a write-only field?
## Reference semantics
One of the big differences between R6 and most other objects in R is that they have reference semantics. This is because they are S3 objects built on top of environments:
```{r}
typeof(x2)
```
The main consequence of reference semantics is that objects are not copied when modified:
```{r}
y1 <- Accumulator$new()
y2 <- y1
y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
```
Instead, if you want a copy, you'll need to explicitly `$clone()` the object:
```{r}
y1 <- Accumulator$new()
y2 <- y1$clone()
y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
```
(Note that `$clone()` does not recursively clone nested R6 objects. If you want that, you'll need to use `$clone(deep = TRUE)`. Note that this only clones R6 objects: if you have other fields with reference semantics (e.g. environments) you'll need to define your own `$clone()`.)
There are three other less obvious consequences:
* It is harder to reason about code that uses R6 objects because you need to
understand more context.
* It makes sense to think about when an R6 object is deleted, and you
can write a `finalizer()` to complement the `initializer()`.
* If one of the fields is an R6 class, you must call `$new()` inside
`$initialize()` not inside `R6Class()`.
These are described in more detail below.
### Reasoning
Generally, reference semantics makes code harder to reason about. Take this very simple example:
```{r, eval = FALSE}
x <- list(a = 1)
y <- list(b = 2)
z <- f(x, y)
```
For the vast majority of functions, you know that the final line only modifies `z`.
Take a similar equivalent that uses an imaginary `List` reference class:
```{r, eval = FALSE}
x <- List$new(a = 1)
y <- List$new(b = 2)
z <- f(x, y)
```
The final line is much harder to reason about - it's completely possible that `f()` calls methods of `x` or `y`, modifying them in place. This is the biggest potential downside of R6. The best way to ameliorate this problem is to avoid writing functions that both return a value and modify R6 inputs.
That said, modifying R6 inputs can lead to substantially simpler code in some cases. One challenge of working with immutable data is known as __threading state__: if you want to return a value that's modified in a deeply nested function, you need to return the modified value up through every function. This can complicate code, particularly if you need to modify multiple values. For example, ggplot2 uses R6 objects for scales. Scales are complex because they need to combine data across every facet and every layer. Using R6 makes the code substantially simpler, at the cost of introducing subtle bugs. Fixing those bugs required careful placement of calls to `$clone()` to ensure that independent plots didn't accidentally share scale data. We'll come back to this idea in [oo-tradeoffs].
### Finalizer
One useful property of reference semantics is that it makes sense to think about when an R6 object is __finalised__, i.e. when it's deleted. This doesn't make sense for S3 and S4 objects because copy-on-modify semantics mean that there may be many transient versions of an object. For example, in the following code, there are actually two factor objects: the second is created when the levels are modified, leaving the first to be destroyed at the next garbage collection.
```{r}
x <- factor(c("a", "b", "c"))
levels(x) <- c("c", "b", "a")
```
Since R6 objects are not copied-on-modify they will only get deleted once, and it makes sense to think about `$finalize()` as a complement to `$initialize()`. Finalizers usually play a similar role to `on.exit()`, cleaning up any resources created by the initializer. For example, the following class wraps up a temporary file, automatically deleting it when the class is finalised.
```{r}
TemporaryFile <- R6Class("TemporaryFile", list(
path = NULL,
initialize = function() {
self$path <- tempfile()
},
finalize = function() {
message("Cleaning up ", self$path)
unlink(self$path)
}
))
tf <- TemporaryFile$new()
```
The finalise method will be run when R exits, or by the first garbage collection after the object has been removed. Generally, this will happen when it happens, but it can occasionally be useful to force a run with an explicit call to `gc()`.
```{r}
rm(tf)
invisible(gc())
```
### R6 fields
A final consequence of reference semantics can crop up where you don't expect it. Beware of setting a default value to an R6 class: it will be shared across all instances of the object. This is because the child object is only initialized once, when you defined the class, not each time you call new.
```{r}
TemporaryDatabase <- R6Class("TemporaryDatabase", list(
con = NULL,
file = TemporaryFile$new(),
initialize = function() {
DBI::dbConnect(RSQLite::SQLite(), path = file$path)
}
))
db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()
db_a$file$path == db_b$file$path
```
You can fix this by creating the object in `$initialize()`:
```{r}
TemporaryDatabase <- R6Class("TemporaryDatabase", list(
con = NULL,
file = NULL,
initialize = function() {
self$file <- TemporaryFile$new()
DBI::dbConnect(RSQLite::SQLite(), path = file$path)
}
))
db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()
db_a$file$path == db_b$file$path
```
### Exercises