forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
OO-essentials.rmd
575 lines (396 loc) · 31.7 KB
/
OO-essentials.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
---
title: OO field guide
layout: default
---
# OO field guide
## Introduction
This chapter is a field guide for recognising and working with R's objects in the wild. R has three object oriented systems (plus the base types), so it can be a bit intimidating. The goal of this guide is not to make you an expert in all four systems, but to help you identify which system you're working with and to help you use it effectively.
Central to any object-oriented system are the concepts of class and method. A __class__ defines the behaviour of __objects__ by describing their attributes and their relationship to other classes. The class is also used when selecting __methods__, functions that behave differently depending on the class of their input. Classes are usually organised in a hierarchy: if a method does not exist for a child, then the parent's method is used instead. This means the child __inherits__ behaviour from the parent.
R's three OO systems differ in how classes and methods are defined:
* __S3__ implements a style of OO programming called generic-function OO. This is different from most programming languages, like Java, C++ and C#, which implement message-passing OO. With message-passing, messages (methods) are sent to objects and the object determines which function to call. Typically, this object has a special appearance in the method call, usually appearing before the name of the method/message: e.g. `canvas.drawRect("blue")`. S3 is different. While computations are still carried out via methods, a special type of function called a __generic function__ decides which method to call, e.g., `drawRect(canvas, "blue")`. S3 is a very casual system. It has no formal definition of classes.
* __S4__ works similarly to S3, but is more formal. There are two major differences to S3. S4 has formal class definitions, which describe the representation and inheritance for each class, and has special helper functions for defining generics and methods. S4 also has multiple dispatch, which means that generic functions can pick methods based on the class of any number of arguments, not just one.
* __Reference classes__, called RC for short, are quite different from S3 and S4. RC implements message-passing OO, so methods belong to classes, not functions. `$` is used to separate objects and methods, so method calls look like `canvas$drawRect("blue")`. RC objects are also mutable: they don't use R's usual copy-on-modify semantics, but are modified in place. This makes them harder to reason about, but allows them to solve problems that are difficult to solve with S3 or S4.
There's also one other system that's not quite OO, but it's important to mention here:
* __base types__, the internal C-level types that underlie the other OO systems. Base types are mostly manipulated using C code, but they're important to know about because they provide the building blocks for the other OO systems.
The following sections describes each system in turn, starting with base types. You'll learn how to recognise the OO system that an object belongs to, how method dispatch works, and how to create new objects, classes, generics and methods for that system. The chapter concludes with a few remarks on when to use each system.
## Base types
Underlying every R object is a C structure (or struct) that describes how that object is stored in memory. The struct includes the contents of the object, the information needed for memory management, and most importantly for this section, a __type__. This is the __base type__ of an R object. Base types are not really an object system. In fact, only the R core team can create new types. This is because every new type makes base R a little more complicated. As a result, new base types are added very rarely: the most recent change, in 2011, added two exotic types that you never see in R, but are useful for diagnosing memory problems (`NEWSXP` and `FREESXP`). Prior to that, a special base type for S4 objects (`S4SXP`) was added in 2005.
[Data structures](#data-structures) explains the most common base types (atomic vectors and lists), but base types also encompass functions, environments and other more exotic objects likes names, calls and promises that you'll learn about later in the book. You can determine an object's base type with `typeof()` (see a complete list of types in `?typeof()`). Be aware that the names of base types are not used consistently throughout R: the type and the corresponding "is" function may have different names:
```{r}
# The type of a function is "closure", and the
# type of a primitive function is "builtin"
f <- function() {}
typeof(f)
is.function(f)
typeof(sum)
is.primitive(sum)
```
You may have heard of `mode()` and `storage.mode()`. I recommend ignoring these functions because they're just aliases of the names returned by `typeof()`, and exist solely for S compatibility. Their source code is simple. So if you want to know exactly what they do, I recommend reading it.
Functions that behave differently for different base types are almost always written in C, where dispatch occurs using switch statements (e.g. `switch(TYPEOF(x))`). Even if you never write C code, it's important to understand base types because everything else is built on top of them: S3 objects can be built on top of any base type, S4 objects use a special base type, and RC objects are a combination of S4 and environments (another base type). To see if an object is a pure base type, i.e. it doesn't also have S3, S4 or RC behaviour, check that `is.object(x)` returns `FALSE`.
## S3
S3 is R's first and simplest OO system. It is the only OO system used in the base and stats packages, and it's the most commonly used system in packages. S3 is informal and ad hoc, but it has a certain elegance in its minimalism: you couldn't take any part of it away and still have a useful OO system.
### Recognising objects, generic functions and methods
Most objects that you encounter are S3 objects. You can check this with `is.object(x) & !isS4(x)` (the second expression tests whether it's _not_ an S4 object). An easier way to determine the OO system of an object is to use `pryr::otype()`:
```{r}
library(pryr)
df <- data.frame(x = 1:10, y = letters[1:10])
otype(df) # A data frame is an S3 class
otype(df$x) # A numeric vector isn't
otype(df$y) # A factor is
```
In S3, methods are associated with functions, called __generic functions__ or generics for short, not objects or classes. This is different from most other programming languages, but is a legitimate OO style.
To determine if a function is an S3 generic, you can look at its source code for a call to `UseMethod()`: that's the function that figures out the correct method to call, the process of __method dispatch__. Similar to `otype()`, pryr also provides `ftype()` which describes the object system, if any, associated with a function:
```{r}
mean
ftype(mean)
```
Some S3 generics, like `[`, `sum` and `cbind`, don't call `UseMethod()` because they are implemented in C. Instead, they call the C functions `DispatchGroup()` or `DispatchOrEval()`. Functions that do method dispatch in C code are called __internal generics__ and are documented in `?"internal generic"`. `ftype()` knows about these special cases too.
The job of an S3 generic is to call an S3 method specialised for a given class. You can recognise S3 methods by their names, which look like `generic.class()`. For example, the Date method for the `mean()` generic is called `mean.Date()`, and the factor method for `print()` is called `print.factor()`. Most modern style guides discourage the use of `.` in function names because it makes them look like S3 methods. For example, is `t.test()` the `test` method for `t` objects? Similarly, the use of `.` in class names can also be confusing: is `print.data.frame()` the `print()` method for `data.frames`, or the `print.data()` method for `frames`? `pryr::ftype()` knows about these exceptions, so you can use it to figure out if a function is an S3 method or generic:
```{r}
ftype(t.data.frame) # data frame method for t()
ftype(t.test) # generic function for t tests
ftype(is.numeric) # naming convention for testing and coercion
ftype(as.numeric) # there are no S3 generics for is and as
```
You can see all the methods of a generic using the `methods()` function:
```{r}
methods("mean")
methods("t.test")
```
(Apart from methods defined in the base package, most S3 methods will not be visible: use `getS3method()` to read their source code.)
You can also list all generics that have a method for a given class:
```{r}
methods(class = "ts")
```
As you'll learn in the following section, because there's no central repository, there's no way to list all S3 classes.
### Defining classes and creating objects
S3 is a simple and ad hoc system; it has no formal definition of a class. To make an object an instance of a class, you just take an existing base object and set the class attribute. You can do that during creation with `structure()`, or after the fact with `attr<-()`. However, if you're modifying an existing object, using `class<-()` will more clearly communicate your intent:
```{r}
# Create and assign class in one step
foo <- structure(list(), class = "foo")
# Create, then set class
foo <- list()
class(foo) <- "foo"
```
S3 objects are usually built on top of lists, or atomic vectors with attributes. (You can refresh your memory of attributes with [attributes](#attributes)). You can also turn functions into S3 objects. Other base types are either rarely seen in R, or have unusual semantics that don't work well with attributes.
You can determine the class of any object using `class(x)`, and see if an object inherits from a specific class using `inherits(x, "classname")`.
```{r}
class(foo)
inherits(foo, "foo")
```
The class of an S3 object can be a vector, which describes behaviour from most to least specific. For example, the class of the `glm()` object is `c("glm", "lm")` indicating that generalised linear models inherit behaviour from linear models. Class names are usually lower case, and you should avoid `.`. Otherwise, opinion is mixed whether to use underscores (`my_class`) or CamelCase (`MyClass`) for multi-word class names.
Most S3 classes provide a constructor function:
```{r}
foo <- function(x) {
if (!is.numeric(x)) stop("X must be numeric")
structure(list(x), class = "foo")
}
```
You should use it if it's available (like for `factor()` and `data.frame()`). This ensures that you're creating the class with the correct components. Constructor functions usually have the same name as the class.
Apart from developer supplied constructor functions, S3 has no checks for correctness. This means you can change the class of existing objects:
```{r}
# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
print(mod)
# Turn it into a data frame (?!)
class(mod) <- "data.frame"
# But unsurprisingly this doesn't work very well
print(mod)
# However, the data is still there
mod$coefficients
```
If you've used other OO languages, this might make you feel queasy. But surprisingly, this flexibility causes few problems: while you _can_ change the type of an object, you never should. R doesn't protect you from yourself: you can easily shoot yourself in the foot. As long as you don't aim the gun at your foot and pull the trigger, you won't have a problem.
### Creating new methods and generics
To add a new generic, create a function that calls `UseMethod()`. `UseMethod()` takes two arguments: the name of the generic function, and the argument to use for method dispatch. If you omit the second argument it will dispatch on the first argument to the function. There's no need to pass any of the arguments of the generic to `UseMethod()`. In fact, you shouldn’t do so. `UseMethod()` uses black magic to find them out for itself.
```{r}
f <- function(x) UseMethod("f")
```
A generic isn't useful without some methods. To add a method, you just create a regular function with the correct (`generic.class`) name:
```{r}
f.a <- function(x) "Class a"
a <- structure(list(), class = "a")
class(a)
f(a)
```
Adding a method to an existing generic works in the same way:
```{r}
mean.a <- function(x) "a"
mean(a)
```
As you can see, there's no check to make sure that the method returns a class compatible with the generic. It's up to you to make sure that your method doesn't violate the expectations of existing code.
### Method dispatch
S3 method dispatch is relatively simple. `UseMethod()` creates a vector of function names, like `paste0("generic", ".", c(class(x), "default"))` and looks for each in turn. The "default" class makes it possible to set up a fall back method for otherwise unknown classes.
```{r}
f <- function(x) UseMethod("f")
f.a <- function(x) "Class a"
f.default <- function(x) "Unknown class"
f(structure(list(), class = "a"))
# No method for b class, so uses method for a class
f(structure(list(), class = c("b", "a")))
# No method for c class, so falls back to default
f(structure(list(), class = "c"))
```
Group generic methods add a little more complexity. Group generics make it possible to implement methods for multiple generics with one function. The four group generics and the functions they include are:
* Math: `abs`, `sign`, `sqrt`, `floor`, `cos`, `sin`, `log`, `exp`, ...
* Ops: `+`, `-`, `*`, `/`, `^`, `%%`, `%/%`, `&`, `|`, `!`, `==`, `!=`, `<`, `<=`, `>=`, `>`
* Summary: `all`, `any`, `sum`, `prod`, `min`, `max`, `range`
* Complex: `Arg`, `Conj`, `Im`, `Mod`, `Re`
Group generics are a relatively advanced technique and are beyond the scope of this chapter but you can find out more about them in `?groupGeneric`. The most important thing to take away from this is to recognise that `Math`, `Ops`, `Summary` and `Complex` aren't real functions. They represent groups of functions. Note that inside a group generic function a special variable `.Generic` provides the actual generic function called.
If you have complex class hierarchies it's sometimes useful to call the "parent" method. It's a little bit tricky to define exactly what that means, but it's basically the method that would have been called if the current method did not exist. Again, this is an advanced technique: you can read about it in `?NextMethod`.
Because methods are normal R functions, you can call them directly. However, this is just as dangerous as changing the class of an object, so you shouldn't do it: please don't point the loaded gun at your foot! (The only reason to call the method directly is that sometimes when you're writing OO code, not using someone else's, you can get considerable improvements in speed by skipping regular method dispatch.)
```{r}
c <- structure(list(), class = "c")
# Call the correct method:
f.default(c)
# Or force R to call the wrong method:
f.a(c)
```
You can also call an S3 generic with a non-S3 object. Non-internal S3 generics will dispatch on the __implicit class__ of base types. (Internal generics don't do that for performance reasons.) The rules to determine the implicit class of a primitive type are somewhat complex, but are shown in the function below:
```{r}
iclass <- function(x) {
if (is.object(x)) stop("x is not a primitive type", call. = FALSE)
c(
if (is.matrix(x)) "matrix",
if (is.array(x) && !is.matrix(x)) "array",
if (is.double(x)) "double",
if (is.integer(x)) "integer",
mode(x)
)
}
iclass(matrix(1:5))
iclass(array(1.5))
```
### Exercises
* Read the source code for `t()` and `t.test()` and confirm that `t.test()` is an S3 generic and not an S3 method. What happens if you create an object with class `test` and call `t()` with it?
* What classes have a method for the `Math` group generic in base R? Read the source code. How do the methods work?
* R has two classes for representing date time data, `POSIXct` and `POSIXlt` which both inherit from `POSIXt`. Which generics have different behaviours for the two classes? Which generics share the same behaviour?
* Which base generic has the greatest number of defined methods?
* `UseMethod()` calls methods in a special way. Predict what the following code will return, then run it and read the help for `UseMethod()` to figure out what's going on. Write down the rules in the simplest form possible.
```{r, eval = FALSE}
y <- 1
g <- function(x) {
y <- 2
UseMethod("g")
}
g.numeric <- function(x) y
g(10)
h <- function(x) {
x <- 10
UseMethod("h")
}
h.character <- function(x) paste("char", x)
h.numeric <- function(x) paste("num", x)
h("a")
```
* Internal generics don't dispatch on the implicit class of base types. Carefully read `?"internal generic"` to determine why the length of `f` and `g` are different in the example below. What function helps distinguish between the behaviour of `f` and `g`?
```{r, eval = FALSE}
f <- function() 1
g <- function() 2
class(g) <- "function"
class(f)
class(g)
length.function <- function(x) "function"
length(f)
length(g)
```
## S4
S4 works in a similar way to S3, but it adds formality and rigour. Methods still belong to functions, not classes, but:
* Classes have a formal definition, describing their fields and inheritance structure (parent classes).
* Method dispatch can be based on multiple arguments to a generic function, not just one.
* There is a special operator, `@`, for extracting fields (aka slots) out of an S4 object.
All S4 related code is stored in the `methods` package. This package is always available when you're running R interactively, but is not always loaded automatically when running R in batch mode. For this reason, it's a good idea to include an explicit `library(methods)` whenever you're using S4.
S4 is a rich and complex system, and there's no way to explain it fully in a few pages. Here I'll focus on the key ideas underlying S4 so that you can use existing S4 objects effectively. To learn more, some good references are:
* [S4 system development in Bioconductor](http://www.bioconductor.org/help/course-materials/2010/AdvancedR/S4InBioconductor.pdf)
* John Chambers' [Software for Data Analysis](http://amzn.com/0387759352?tag=devtools-20)
* [Martin Morgan's answers to S4 questions on stackoverflow](http://stackoverflow.com/search?tab=votes&q=user%3a547331%20%5bs4%5d%20is%3aanswe)
### Recognising objects, generic functions and methods
Recognising S4 objects, generics and methods is easy. You can identify an S4 object because `str()` describes it as a "formal" class, `isS4()` is true, and `pryr::otype()` returns "S4". S4 generics and methods are also easy to identify because they are S4 objects with well defined classes.
There aren't any S4 classes in the commonly used base packages (stats, graphics, utils, datasets, and base), so we'll start by creating an S4 object from the built-in stats4 package, which provides some S4 classes and methods associated with maximum likelihood estimation:
```{r}
library(stats4)
# From example(mle)
y <- c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8)
nLL <- function(lambda) - sum(dpois(y, lambda, log = TRUE))
fit <- mle(nLL, start = list(lambda = 5), nobs = length(y))
# An S4 object
isS4(fit)
otype(fit)
# An S4 generic
isS4(nobs)
ftype(nobs)
# Retrieve an S4 method, described later
mle_nobs <- method_from_call(nobs(fit))
isS4(mle_nobs)
ftype(mle_nobs)
```
You can determine the class of an S4 object with `class()` and test if an object inherits from a specific class with `is()`:
```{r}
class(fit)
is(fit, "mle")
```
You can get a list of all S4 generics with `getGenerics()`, and a list of all S4 classes with `getClasses()`, but note that this list includes shim classes for S3 classes and base types. You can list all S4 methods with `showMethods()`, optionally restricting either by `generic` or by `class` (or both). It's also a good idea to supply `where = search()` to restrict to methods available from the global environment.
### Defining classes and creating objects
In S3, you can turn any object into an object of a particular class just by setting the class attribute. S4 is much stricter: you must define the representation of the class using `setClass()`, and create an new object with `new()`. You can find the documentation for a class with a special syntax: `class?className`, e.g. `class?mle`.
An S4 class has three key properties:
* A __name__: an alpha-numeric class identifier. S4 class names should
use UpperCamelCase.
* A named list of __slots__ (fields), providing slot names and
permitted classes. For example, a person class might be represented by a
character name and a numeric age: `list(name = "character", age = "numeric")`.
* A string giving the class it inherits from, or in S4 terminology,
that it __contains__. You can provide multiple classes for multiple
inheritance, but this is an advanced technique and it adds much
complexity.
In `slots` and `contains` you can use S4 classes, S3 classes registered
with `setOldClass()`, or the implicit class of a base type. In `slots`
you can also use the special class `ANY` which does not restrict the input.
S4 classes have other optional properties like a `validity` method that tests if an object is valid, and a `prototype` that defines slot default values. See `?setClass` for more details.
The following example creates a Person class with fields name and age, and an Employee class that inherits from Person. The Employee class inherits the slots and methods from the Person, and adds an additional slot, boss. To create objects we call `new()` with the name of the class, and name-value pairs of slot values.
```{r}
setClass("Person",
slots = list(name = "character", age = "numeric"))
setClass("Employee",
slots = list(boss = "Person"),
contains = "Person")
alice <- new("Person", name = "Alice", age = 40)
john <- new("Employee", name = "John", age = 20, boss = alice)
```
Most S4 classes also come with a constructor function with the same name as the class: if that exists, use it instead of calling `new()` directly.
To access slots of an S4 object you use `@` or `slot()`:
```{r}
alice@age
slot(john, "boss")
```
(`@` is equivalent to `$`, and `slot()` to `[[`.)
If an S4 object contains (inherits) from an S3 class or a base type, it will have a special `.Data` slot which contains the underlying base type or S3 object:
```{r}
setClass("RangedNumeric",
contains = "numeric",
slots = list(min = "numeric", max = "numeric"))
rn <- new("RangedNumeric", 1:10, min = 1, max = 10)
rn@min
```
Since R is an interactive programming language, it's possible to create new classes or redefine existing classes at any time. This can be a problem when you're interactively experimenting with S4. If you modify a class, make sure you also recreate any objects of that class, otherwise you'll end up with invalid objects.
### Creating new methods and generics
S4 provides special functions for creating new generics and methods. `setGeneric()` will create a new generic or convert an existing function into a generic. `setMethod()` takes the name of the generic, the classes the method should be associated with and a function that implements the method. For example, we could take `union()`, which usually just works on vectors, and make it work with data frames:
```{r}
setGeneric("union")
setMethod("union",
c(x = "data.frame", y = "data.frame"),
function(x, y) {
unique(rbind(x, y))
}
)
```
If you create a new generic from scratch, you also need to supply a function that calls `standardGeneric()`:
```{r}
setGeneric("myGeneric", function(x) {
standardGeneric("myGeneric")
})
```
### Method dispatch
S4 method dispatch is the same as S3 dispatch if your classes only inherit from a single parent, and you only dispatch on one class. The main difference is how you set up default values: S4 uses the special class `ANY` to match any class and "missing" to match a missing argument. Like S3, S4 also has group generics, documented in `?S4groupGeneric`, and a way to call the "parent" method, `callNextMethod()`.
Method dispatch becomes considerably more complicated if you dispatch on multiple arguments, or your classes use multiple inheritance. The rules are described in `?Methods`, but they are complicated and it's difficult to predict which method will be called. For this reason, I strongly recommend avoiding multiple inheritance and multiple dispatch unless absolutely necessary.
Finally, there are two functions that identify the method that will be invoked given the specification of a call to a generic:
```{r, eval = FALSE}
# From methods: takes generic name and class names
selectMethod("nobs", list("mle"))
# From pryr: takes an unevaluated function call
method_from_call(nobs(fit))
```
### Exercises
* Which S4 generic has the most methods defined for it? Which S4 class has the most methods associated with it?
* What happens if you define a new S4 class that doesn't "contain" an existing class? (Hint: read about virtual classes in `?Classes`)
* What happens if you pass an S4 object to an S3 generic? What happens if you pass an S3 object to an S4 generic? (Hint: read `?setOldClass` for the second case)
## RC
Reference classes (or RC for short) are the newest OO system in R. They were introduced in version 2.12. They are fundamentally different to S3 and S4 because:
* RC methods belong to objects, not functions
* RC objects are mutable: the usual R copy-on-modify semantics do not apply
These properties make RC objects behave more like objects do in most other programming languages, e.g., Python, Ruby, Java and C#. Surprisingly, reference classes are implemented in R, not C: they are a special S4 class that wraps around an environment.
### Defining classes and creating objects
Since there aren't any reference classes provided by the base R packages, we'll start by creating one. RC classes are best used for describing stateful objects, objects that change over time, so we'll create a simple class to model a bank account.
Creating a new RC class is similar to creating a new S4 class, but you use `setRefClass()` instead of `setClass()`. The first, and only required argument, is an alpha-numeric __name__. While you can use `new()` to create new RC objects, it's good style to use the object returned by `setRefClass()` to generate new objects. (You can also do that with S4 classes, but it's less common.)
```{r}
Account <- setRefClass("Account")
Account$new()
```
`setRefClass()` also accepts a list of name-class pairs that define class __fields__ (equivalent to S4 slots). Additional named arguments passed to `new()` will set initial values of the fields. You can get and set field values with `$`:
```{r}
Account <- setRefClass("Account",
fields = list(balance = "numeric"))
a <- Account$new(balance = 100)
a$balance
a$balance <- 200
a$balance
```
Instead of supplying a class name for the field, you can provide a single argument function which will act as an accessor method. This allows you to add custom behaviour when getting or setting a field. See `?setRefClass` for more details.
Note that RC objects are __mutable__, i.e., they have reference semantics, and are not copied-on-modify:
```{r}
b <- a
b$balance
a$balance <- 0
b$balance
```
For this reason, RC objects come with a `copy()` method that allow you to make a copy of the object:
```{r}
c <- a$copy()
c$balance
a$balance <- 100
c$balance
```
An object is not very useful without some behaviour defined by __methods__. RC methods are associated with a class and can modify its fields in place. In the following example, note that you access the value of fields with their name, and modify them with `<<-`. (You'll learn more about `<<-` in [Environments](#environments).)
```{r}
Account <- setRefClass("Account",
fields = list(balance = "numeric"),
methods = list(
withdraw = function(x) {
balance <<- balance - x
},
deposit = function(x) {
balance <<- balance + x
}
)
)
```
You call an RC method in the same way as you access a field:
```{r}
a <- Account$new(balance = 100)
a$deposit(100)
a$balance
```
The final important argument to `setRefClass()` is `contains`. This is the name of the parent RC class to inherit behaviour from. The following example creates a new type of bank account that returns an error preventing the balance from going below 0.
```{r, error = TRUE}
NoOverdraft <- setRefClass("NoOverdraft",
contains = "Account",
methods = list(
withdraw = function(x) {
if (balance < x) stop("Not enough money")
balance <<- balance - x
}
)
)
accountJohn <- NoOverdraft$new(balance = 100)
accountJohn$deposit(50)
accountJohn$balance
accountJohn$withdraw(200)
```
All reference classes eventually inherit from `envRefClass`. It provides useful methods like `copy()` (shown above), `callSuper()` (to call the parent field), `field()` (to get the value of a field given its name), `export()` (equivalent to `as`) and `show()` (overridden to control printing). See the inheritance section in `setRefClass()` for more details.
### Recognising objects and methods
You can recognise RC objects because they are S4 objects (`isS4(x)`) that inherit from "refClass" (`is(x, "refClass")`). `pryr::otype()` will return "RC". RC methods are also S4 objects, with class `refMethodDef`.
### Method dispatch
Method dispatch is very simple in RC because methods are associated with classes, not functions. When you call `x$f()`, R will look for a method f in the class of x, then in its parent, then its parent's parent, and so on. From within a method, you can call the parent method directly with `callSuper(...)`.
### Exercises
* Use a field function to prevent the account balance from being directly manipulated. (Hint: create a "hidden" `.balance` field, and read the help for the fields argument in `setRefClass()`)
* I claimed that there aren't any RC classes in base R, but that was a bit of a simplification. Use `getClasses()` and find which classes `extend()` from `envRefClass`. What are the classes used for? (Hint: recall how to look up the documentation for a class)
## Picking a system
Three OO systems is a lot for one language, but for most R programming, S3 suffices. In R you usually create fairly simple objects and methods for pre-existing generic functions like `print()`, `summary()` and `plot()`. S3 is well suited to this task, and the majority of OO code that I have written in R is S3. S3 is a little quirky, but it gets the job done with a minimum of code.
```{r, eval = FALSE, echo = FALSE}
packageVersion("Matrix")
gs <- getGenerics("package:Matrix")
sum(gs@package == "Matrix")
length(getClasses("package:Matrix", FALSE))
```
If you are creating more complicated systems of interrelated objects, S4 may be more appropriate. A good example is the `Matrix` package by Douglas Bates and Martin Maechler. It is designed to efficiently store and compute with many different types of sparse matrices. As of version 1.0.12, it defines 110 classes and 18 generic functions. The package is well written and well commented, and the accompanying vignette (`vignette("Intro2Matrix", package = "Matrix")`) gives a good overview of the structure of the package. S4 is also used extensively by Bioconductor packages, which need to model complicated interrelationships between biological objects. Bioconductor provides many [good resources](https://www.google.com/search?q=bioconductor+s4) for learning S4. If you've mastered S3, S4 is relatively easy to pick up: the ideas are all the same, it is just more formal, more strict and more verbose.
If you've programmed in mainstream OO language, RC will seem very natural. But because they can introduce side-effects through mutable state, they are harder to understand. For example, when you usually call `f(a, b)` in R you can assume that `a` and `b` will not be modified. But if `a` and `b` are RC objects, they might be modified in the place. Generally, when using RC objects you want to minimise side effects as much as possible, and use them only where mutable states are absolutely required. The majority of functions should still be "functional", and free of side effects. This makes code easier to reason about and easier for other R programmers to understand.