forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
R6.Rmd
615 lines (459 loc) · 22.9 KB
/
R6.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
# R6 {#r6}
```{r, include = FALSE}
source("common.R")
```
## Introduction
\index{R6}
This chapter describes the R6 OOP system. R6 has two special properties:
* It uses the encapsulated OOP paradigm, which means that methods belong to
objects, not generics, and you call them like `object$method()`.
* R6 objects are __mutable__, which means that they are modified in place, and
hence have reference semantics.
If you've learned OOP in another programming language, it's likely that R6 will feel very natural, and you'll be inclined to prefer it over S3. Resist the temptation to follow the path of least resistance: in most cases R6 will lead you to non-idiomatic R code. We'll come back to this theme in Section \@ref(s3-r6).
R6 is very similar to a base OOP system called __reference classes__, or RC for short. I describe why I teach R6 and not RC in Section \@ref(why-r6).
### Outline {-}
* Section \@ref(r6-classes) introduces `R6::R6class()`, the one function that
you need to know to create R6 classes. You'll learn about the constructor
method, `$new()`, which allows you to create R6 objects, as well as other
important methods like `$initialize()` and `$print()`.
* Section \@ref(r6-access) discusses the access mechanisms of R6: private and
active fields. Together, these allow you to hide data from the user, or
expose private data for reading but not writing.
* Section \@ref(r6-semantics) explores the consequences of R6's reference
semantics. You'll learn about the use of finalizers to automatically
clean up any operations performed in the initializer, and a common gotcha
if you use an R6 object as a field in another R6 object.
* Section \@ref(why-r6) describes why I cover R6, rather than the base RC
system.
### Prerequisites {-}
Because [R6](https://r6.r-lib.org) is not built into base R, you'll need to install and load the R6 package to use it:
```{r setup}
# install.packages("R6")
library(R6)
```
R6 objects have reference semantics which means that they are modified in-place, not copied-on-modify. If you're not familiar with these terms, brush up your vocab by reading Section \@ref(modify-in-place).
## Classes and methods {#r6-classes}
\index{R6!classes}
\index{classes!R6}
\indexc{self}
\index{R6!R6Class@\texttt{R6Class()}}
R6 only needs a single function call to create both the class and its methods: `R6::R6Class()`. This is the only function from the package that you'll ever use![^package]
[^package]: That means if you're creating R6 in a package, you only need to make sure it's listed in the `Imports` field of the `DESCRIPTION`. There's no need to import the package into the `NAMESPACE`.
The following example shows the two most important arguments to `R6Class()`:
* The first argument is the `classname`. It's not strictly needed, but it
improves error messages and makes it possible to use R6 objects with S3
generics. By convention, R6 classes have `UpperCamelCase` names.
* The second argument, `public`, supplies a list of methods (functions) and
fields (anything else) that make up the public interface of the object.
By convention, methods and fields use `snake_case`. Methods can access
the methods and fields of the current object via `self$`.[^python]
\index{methods!R6}
[^python]: Unlike in `this` in python, the `self` variable is automatically provided by R6, and does not form part of the method signature.
```{r}
Accumulator <- R6Class("Accumulator", list(
sum = 0,
add = function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
)
```
You should always assign the result of `R6Class()` into a variable with the same name as the class, because `R6Class()` returns an R6 object that defines the class:
```{r}
Accumulator
```
\index{constructors!R6}
You construct a new object from the class by calling the `new()` method. In R6, methods belong to objects, so you use `$` to access `new()`:
```{r}
x <- Accumulator$new()
```
You can then call methods and access fields with `$`:
```{r}
x$add(4)
x$sum
```
In this class, the fields and methods are public, which means that you can get or set the value of any field. Later, we'll see how to use private fields and methods to prevent casual access to the internals of your class.
To make it clear when we're talking about fields and methods as opposed to variables and functions, I'll prefix their names with `$`. For example, the `Accumulate` class has field `$sum` and method `$add()`.
### Method chaining
\index{method chaining}
`$add()` is called primarily for its side-effect of updating `$sum`.
```{r}
Accumulator <- R6Class("Accumulator", list(
sum = 0,
add = function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
)
```
Side-effect R6 methods should always return `self` invisibly. This returns the "current" object and makes it possible to chain together multiple method calls:
```{r}
x$add(10)$add(10)$sum
```
For, readability, you might put one method call on each line:
```{r}
x$
add(10)$
add(10)$
sum
```
This technique is called __method chaining__ and is commonly used in languages like Python and JavaScript. Method chaining is deeply related to the pipe, and we'll discuss the pros and cons of each approach in Section \@ref(tradeoffs-pipe).
### Important methods {#r6-important-methods}
\index{R6!methods!print}
\index{R6!methods!initialize}
There are two important methods that should be defined for most classes: `$initialize()` and `$print()`. They're not required, but providing them will make your class easier to use.
`$initialize()` overrides the default behaviour of `$new()`. For example, the following code defines an Person class with fields `$name` and `$age`. To ensure that that `$name` is always a single string, and `$age` is always a single number, I placed checks in `$initialize()`.
```{r, error = TRUE}
Person <- R6Class("Person", list(
name = NULL,
age = NA,
initialize = function(name, age = NA) {
stopifnot(is.character(name), length(name) == 1)
stopifnot(is.numeric(age), length(age) == 1)
self$name <- name
self$age <- age
}
))
hadley <- Person$new("Hadley", age = "thirty-eight")
hadley <- Person$new("Hadley", age = 38)
```
If you have more expensive validation requirements, implement them in a separate `$validate()` and only call when needed.
Defining `$print()` allows you to override the default printing behaviour. As with any R6 method called for its side effects, `$print()` should return `invisible(self)`.
```{r}
Person <- R6Class("Person", list(
name = NULL,
age = NA,
initialize = function(name, age = NA) {
self$name <- name
self$age <- age
},
print = function(...) {
cat("Person: \n")
cat(" Name: ", self$name, "\n", sep = "")
cat(" Age: ", self$age, "\n", sep = "")
invisible(self)
}
))
hadley2 <- Person$new("Hadley")
hadley2
```
This code illustrates an important aspect of R6. Because methods are bound to individual objects, the previously created `hadley` object does not get this new method:
```{r}
hadley
hadley$print
```
From the perspective of R6, there is no relationship between `hadley` and `hadley2`; they just coincidentally share the same class name. This doesn't cause problems when using already developed R6 objects but can make interactive experimentation confusing. If you're changing the code and can't figure out why the results of method calls aren't any different, make sure you've re-constructed R6 objects with the new class.
### Adding methods after creation
\index{R6!methods!adding extra}
Instead of continuously creating new classes, it's also possible to modify the fields and methods of an existing class. This is useful when exploring interactively, or when you have a class with many functions that you'd like to break up into pieces. Add new elements to an existing class with `$set()`, supplying the visibility (more on in Section \@ref(r6-access)), the name, and the component.
```{r, eval = FALSE}
Accumulator <- R6Class("Accumulator")
Accumulator$set("public", "sum", 0)
Accumulator$set("public", "add", function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
```
As above, new methods and fields are only available to new objects; they are not retrospectively added to existing objects.
### Inheritance
\index{R6!inheritance}
\index{inheritance!R6}
To inherit behaviour from an existing class, provide the class object to the `inherit` argument:
```{r}
AccumulatorChatty <- R6Class("AccumulatorChatty",
inherit = Accumulator,
public = list(
add = function(x = 1) {
cat("Adding ", x, "\n", sep = "")
super$add(x = x)
}
)
)
x2 <- AccumulatorChatty$new()
x2$add(10)$add(1)$sum
```
`$add()` overrides the superclass implementation, but we can still delegate to the superclass implementation by using `super$`. (This is analogous to `NextMethod()` in S3, as discussed in Section \@ref(s3-inheritance).) Any methods which are not overridden will use the implementation in the parent class.
### Introspection
\index{R6!introspection}
Every R6 object has an S3 class that reflects its hierarchy of R6 classes. This means that the easiest way to determine the class (and all classes it inherits from) is to use `class()`:
```{r}
class(hadley2)
```
The S3 hierarchy includes the base "R6" class. This provides common behaviour, including a `print.R6()` method which calls `$print()`, as described above.
\index{R6!methods!listing}
You can list all methods and fields with `names()`:
```{r}
names(hadley2)
```
We defined `$name`, `$age`, `$print`, and `$initialize`. As suggested by the name, `.__enclos_env__` is an internal implementation detail that you shouldn't touch; we'll come back to `$clone()` in Section \@ref(r6-semantics).
### Exercises
1. Create a bank account R6 class that stores a balance and allows you to
deposit and withdraw money. Create a subclass that throws an error
if you attempt to go into overdraft. Create another subclass that allows
you to go into overdraft, but charges you a fee.
1. Create an R6 class that represents a shuffled deck of cards. You should be
able to draw cards from the deck with `$draw(n)`, and return all cards to
the deck and reshuffle with `$reshuffle()`. Use the following code to make
a vector of cards.
```{r}
suit <- c("♠", "♥", "♦", "♣")
value <- c("A", 2:10, "J", "Q", "K")
cards <- paste0(rep(value, 4), suit)
```
1. Why can't you model a bank account or a deck of cards with an S3 class?
1. Create an R6 class that allows you to get and set the current timezone.
You can access the current timezone with `Sys.timezone()` and set it
with `Sys.setenv(TZ = "newtimezone")`. When setting the time zone, make
sure the new time zone is in the list provided by `OlsonNames()`.
1. Create an R6 class that manages the current working directory.
It should have `$get()` and `$set()` methods.
1. Why can't you model the time zone or current working directory with an S3
class?
1. What base type are R6 objects built on top of? What attributes do they
have?
## Controlling access {#r6-access}
\index{R6!access control}
`R6Class()` has two other arguments that work similarly to `public`:
* `private` allows you to create fields and methods that are only available
from within the class, not outside of it.
* `active` allows you to use accessor functions to define dynamic, or
active, fields.
These are described in the following sections.
### Privacy
\index{R6!methods!private}
With R6 you can define __private__ fields and methods, elements that can only be accessed from within the class, not from the outside[^try-hard]. There are two things that you need to know to take advantage of private elements:
[^try-hard]: Because R is such a flexible language, it's technically still possible to access private values, but you'll have to try much harder, spelunking in to the details of R6's implementation.
* The `private` argument to `R6Class` works in the same way as the `public`
argument: you give it a named list of methods (functions) and fields
(everything else).
* Fields and methods defined in `private` are available within the methods
using `private$` instead of `self$`. You cannot access private fields or
methods outside of the class.
To make this concrete, we could make `$age` and `$name` fields of the Person class private. With this definition of `Person` we can only set `$age` and `$name` during object creation, and we cannot access their values from outside of the class.
```{r}
Person <- R6Class("Person",
public = list(
initialize = function(name, age = NA) {
private$name <- name
private$age <- age
},
print = function(...) {
cat("Person: \n")
cat(" Name: ", private$name, "\n", sep = "")
cat(" Age: ", private$age, "\n", sep = "")
}
),
private = list(
age = NA,
name = NULL
)
)
hadley3 <- Person$new("Hadley")
hadley3
hadley3$name
```
The distinction between public and private fields is important when you create complex networks of classes, and you want to make it as clear as possible what it's ok for others to access. Anything that's private can be more easily refactored because you know others aren't relying on it. Private methods tend to be less important in R compared to other programming languages because the object hierarchies in R tend to be simpler.
### Active fields
\index{R6!active fields}
\index{active bindings}
Active fields allow you to define components that look like fields from the outside, but are defined with functions, like methods. Active fields are implemented using __active bindings__ (Section \@ref(advanced-bindings)). Each active binding is a function that takes a single argument: `value`. If the argument is `missing()`, the value is being retrieved; otherwise it's being modified.
For example, you could make an active field `random` that returns a different value every time you access it:
```{r}
Rando <- R6::R6Class("Rando", active = list(
random = function(value) {
if (missing(value)) {
runif(1)
} else {
stop("Can't set `$random`", call. = FALSE)
}
}
))
x <- Rando$new()
x$random
x$random
x$random
```
\index{validators!R6}
Active fields are particularly useful in conjunction with private fields, because they make it possible to implement components that look like fields from the outside but provide additional checks. For example, we can use them to make a read-only `age` field, and to ensure that `name` is a length 1 character vector.
```{r, error = TRUE}
Person <- R6Class("Person",
private = list(
.age = NA,
.name = NULL
),
active = list(
age = function(value) {
if (missing(value)) {
private$.age
} else {
stop("`$age` is read only", call. = FALSE)
}
},
name = function(value) {
if (missing(value)) {
private$.name
} else {
stopifnot(is.character(value), length(value) == 1)
private$.name <- value
self
}
}
),
public = list(
initialize = function(name, age = NA) {
private$.name <- name
private$.age <- age
}
)
)
hadley4 <- Person$new("Hadley", age = 38)
hadley4$name
hadley4$name <- 10
hadley4$age <- 20
```
### Exercises
1. Create a bank account class that prevents you from directly setting the
account balance, but you can still withdraw from and deposit to. Throw
an error if you attempt to go into overdraft.
1. Create a class with a write-only `$password` field. It should have
`$check_password(password)` method that returns `TRUE` or `FALSE`, but
there should be no way to view the complete password.
1. Extend the `Rando` class with another active binding that allows you to
access the previous random value. Ensure that active binding is the only
way to access the value.
1. Can subclasses access private fields/methods from their parent? Perform
an experiment to find out.
## Reference semantics {#r6-semantics}
\index{reference semantics}
One of the big differences between R6 and most other objects is that they have reference semantics. The primary consequence of reference semantics is that objects are not copied when modified:
```{r}
y1 <- Accumulator$new()
y2 <- y1
y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
```
Instead, if you want a copy, you'll need to explicitly `$clone()` the object:
```{r}
y1 <- Accumulator$new()
y2 <- y1$clone()
y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
```
(`$clone()` does not recursively clone nested R6 objects. If you want that, you'll need to use `$clone(deep = TRUE)`.)
There are three other less obvious consequences:
* It is harder to reason about code that uses R6 objects because you need to
understand more context.
* It makes sense to think about when an R6 object is deleted, and you
can write a `$finalize()` to complement the `$initialize()`.
* If one of the fields is an R6 object, you must create it inside
`$initialize()`, not `R6Class()`.
These consequences are described in more detail below.
### Reasoning
Generally, reference semantics makes code harder to reason about. Take this very simple example:
```{r, eval = FALSE}
x <- list(a = 1)
y <- list(b = 2)
z <- f(x, y)
```
For the vast majority of functions, you know that the final line only modifies `z`.
Take a similar example that uses an imaginary `List` reference class:
```{r, eval = FALSE}
x <- List$new(a = 1)
y <- List$new(b = 2)
z <- f(x, y)
```
The final line is much harder to reason about: if `f()` calls methods of `x` or `y`, it might modify them as well as `z`. This is the biggest potential downside of R6 and you should take care to avoid it by writing functions that either return a value, or modify their R6 inputs, but not both. That said, doing both can lead to substantially simpler code in some cases, and we'll discuss this further in Section \@ref(threading-state).
### Finalizer
\index{R6!methods!finalizer}
\index{finalizers}
One useful property of reference semantics is that it makes sense to think about when an R6 object is __finalized__, i.e. when it's deleted. This doesn't make sense for most objects because copy-on-modify semantics mean that there may be many transient versions of an object, as alluded to in Section \@ref(gc). For example, the following creates two factor objects: the second is created when the levels are modified, leaving the first to be destroyed by the garbage collector.
```{r}
x <- factor(c("a", "b", "c"))
levels(x) <- c("c", "b", "a")
```
Since R6 objects are not copied-on-modify they are only deleted once, and it makes sense to think about `$finalize()` as a complement to `$initialize()`. Finalizers usually play a similar role to `on.exit()` (as described in Section \@ref(on-exit)), cleaning up any resources created by the initializer. For example, the following class wraps up a temporary file, automatically deleting it when the class is finalized.
```{r}
TemporaryFile <- R6Class("TemporaryFile", list(
path = NULL,
initialize = function() {
self$path <- tempfile()
},
finalize = function() {
message("Cleaning up ", self$path)
unlink(self$path)
}
))
```
The finalize method will be run when the object is deleted (or more precisely, by the first garbage collection after the object has been unbound from all names) or when R exits. This means that the finalizer can be called effectively anywhere in your R code, and therefore it's almost impossible to reason about finalizer code that touches shared data structures. Avoid these potential problems by only using the finalizer to clean up private resources allocated by initializer.
```{r, result = FALSE}
tf <- TemporaryFile$new()
rm(tf)
#> Cleaning up /tmp/Rtmpk73JdI/file155f31d8424bd
```
### R6 fields
\index{mutable default arguments}
A final consequence of reference semantics can crop up where you don't expect it. If you use an R6 class as the default value of a field, it will be shared across all instances of the object! Take the following code: we want to create a temporary database every time we call `TemporaryDatabase$new()`, but the current code always uses the same path.
```{r}
TemporaryDatabase <- R6Class("TemporaryDatabase", list(
con = NULL,
file = TemporaryFile$new(),
initialize = function() {
self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)
},
finalize = function() {
DBI::dbDisconnect(self$con)
}
))
db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()
db_a$file$path == db_b$file$path
```
(If you're familiar with Python, this is very similar to the "mutable default argument" problem.)
The problem arises because `TemporaryFile$new()` is called only once when the `TemporaryDatabase` class is defined. To fix the problem, we need to make sure it's called every time that `TemporaryDatabase$new()` is called, i.e. we need to put it in `$initialize()`:
```{r}
TemporaryDatabase <- R6Class("TemporaryDatabase", list(
con = NULL,
file = NULL,
initialize = function() {
self$file <- TemporaryFile$new()
self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)
},
finalize = function() {
DBI::dbDisconnect(self$con)
}
))
db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()
db_a$file$path == db_b$file$path
```
### Exercises
1. Create a class that allows you to write a line to a specified file.
You should open a connection to the file in `$initialize()`, append a
line using `cat()` in `$append_line()`, and close the connection in
`$finalize()`.
## Why R6? {#why-r6}
\index{reference classes}
\index{R6!versus reference classes}
R6 is very similar to a built-in OO system called __reference classes__, or RC for short. I prefer R6 to RC because:
* R6 is much simpler. Both R6 and RC are built on top of environments, but
while R6 uses S3, RC uses S4. This means to fully understand RC, you need
to understand how the more complicated S4 works.
* R6 has comprehensive online documentation at <https://r6.r-lib.org>.
* R6 has a simpler mechanism for cross-package subclassing, which just
works without you having to think about it. For RC, read the details in the
"External Methods; Inter-Package Superclasses" section of `?setRefClass`.
* RC mingles variables and fields in the same stack of environments so that you
get (`field`) and set (`field <<- value`) fields like regular values. R6 puts
fields in a separate environment so you get (`self$field`) and set
(`self$field <- value`) with a prefix. The R6 approach is more verbose but
I like it because it is more explicit.
* R6 is much faster than RC. Generally, the speed of method dispatch is not
important outside of microbenchmarks. However, RC is quite slow, and switching
from RC to R6 led to a substantial performance improvement in the shiny package.
For more details, see `vignette("Performance", "R6")`.
* RC is tied to R. That means if any bugs are fixed, you can only take
advantage of the fixes by requiring a newer version of R. This makes it
difficult for packages (like those in the tidyverse) that need to work across
many R versions.
* Finally, because the ideas that underlie R6 and RC are similar, it will only
require a small amount of additional effort to learn RC if you need to.