forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathS3.Rmd
965 lines (682 loc) · 37.3 KB
/
S3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
# S3 {#s3}
```{r, include = FALSE}
source("common.R")
```
## Introduction
\index{S3}
S3 is R's first and simplest OO system. S3 is informal and ad hoc, but there is a certain elegance in its minimalism: you can't take away any part of it and still have a useful OO system. For these reasons, you should use it, unless you have a compelling reason to do otherwise. S3 is the only OO system used in the base and stats packages, and it's the most commonly used system in CRAN packages.
S3 is very flexible, which means it allows you to do things that are quite ill-advised. If you're coming from a strict environment like Java this will seem pretty frightening, but it gives R programmers a tremendous amount of freedom. It may be very difficult to prevent people from doing something you don't want them to do, but your users will never be held back because there is something you haven't implemented yet. Since S3 has few built-in constraints, the key to its successful use is applying the constraints yourself. This chapter will therefore teach you the conventions you should (almost) always follow.
The goal of this chapter is to show you how the S3 system works, not how to use it effectively to create new classes and generics. I'd recommend coupling the theoretical knowledge from this chapter with the practical knowledge encoded in the [vctrs package](https://vctrs.r-lib.org).
### Outline {-}
* Section \@ref(s3-basics) gives a rapid overview of all the main components
of S3: classes, generics, and methods. You'll also learn about
`sloop::s3_dispatch()`, which we'll use throughout the chapter to explore
how S3 works.
* Section \@ref(s3-classes) goes into the details of creating a new S3 class,
including the three functions that should accompany most classes:
a constructor, a helper, and a validator.
* Section \@ref(s3-methods) describes how S3 generics and methods work,
including the basics of method dispatch.
* Section \@ref(object-styles) discusses the four main styles of S3 objects:
vector, record, data frame, and scalar.
* Section \@ref(s3-inheritance) demonstrates how inheritance works in S3,
and shows you what you need to make a class "subclassable".
* Section \@ref(s3-dispatch) concludes the chapter with a discussion of the
finer details of method dispatch including base types, internal generics,
group generics, and double dispatch.
### Prerequisites {-}
S3 classes are implemented using attributes, so make sure you're familiar with the details described in Section \@ref(attributes). We'll use existing base S3 vectors for examples and exploration, so make sure that you're familiar with the factor, Date, difftime, POSIXct, and POSIXlt classes described in Section \@ref(s3-atomic-vectors).
We'll use the [sloop](https://sloop.r-lib.org) package for its interactive helpers.
```{r setup, messages = FALSE}
library(sloop)
```
## Basics {#s3-basics}
\index{attributes!class}
\index{classes!S3}
\indexc{class()}
An S3 object is a base type with at least a `class` attribute (other attributes may be used to store other data). For example, take the factor. Its base type is the integer vector, it has a `class` attribute of "factor", and a `levels` attribute that stores the possible levels:
```{r}
f <- factor(c("a", "b", "c"))
typeof(f)
attributes(f)
```
You can get the underlying base type by `unclass()`ing it, which strips the class attribute, causing it to lose its special behaviour:
```{r}
unclass(f)
```
\index{generics}
\index{functions!generic}
An S3 object behaves differently from its underlying base type whenever it's passed to a __generic__ (short for generic function). The easiest way to tell if a function is a generic is to use `sloop::ftype()` and look for "generic" in the output:
```{r}
ftype(print)
ftype(str)
ftype(unclass)
```
A generic function defines an interface, which uses a different implementation depending on the class of an argument (almost always the first argument). Many base R functions are generic, including the important `print()`:
```{r}
print(f)
# stripping class reverts to integer behaviour
print(unclass(f))
```
Beware that `str()` is generic, and some S3 classes use that generic to hide the internal details. For example, the `POSIXlt` class used to represent date-time data is actually built on top of a list, a fact which is hidden by its `str()` method:
```{r}
time <- strptime(c("2017-01-01", "2020-05-04 03:21"), "%Y-%m-%d")
str(time)
str(unclass(time))
```
The generic is a middleman: its job is to define the interface (i.e. the arguments) then find the right implementation for the job. The implementation for a specific class is called a __method__, and the generic finds that method by performing __method dispatch__.
You can use `sloop::s3_dispatch()` to see the process of method dispatch:
```{r}
s3_dispatch(print(f))
```
\index{S3!methods}
We'll come back to the details of dispatch in Section \@ref(method-dispatch), for now note that S3 methods are functions with a special naming scheme, `generic.class()`. For example, the `factor` method for the `print()` generic is called `print.factor()`. You should never call the method directly, but instead rely on the generic to find it for you.
Generally, you can identify a method by the presence of `.` in the function name, but there are a number of important functions in base R that were written before S3, and hence use `.` to join words. If you're unsure, check with `sloop::ftype()`:
```{r}
ftype(t.test)
ftype(t.data.frame)
```
\index{S3!finding source}
Unlike most functions, you can't see the source code for most S3 methods[^base-s3] just by typing their names. That's because S3 methods are not usually exported: they live only inside the package, and are not available from the global environment. Instead, you can use `sloop::s3_get_method()`, which will work regardless of where the method lives:
```{r, error = TRUE}
weighted.mean.Date
s3_get_method(weighted.mean.Date)
```
[^base-s3]: The exceptions are methods found in the base package, like `t.data.frame`, and methods that you've created.
### Exercises
1. Describe the difference between `t.test()` and `t.data.frame()`.
When is each function called?
1. Make a list of commonly used base R functions that contain `.` in their
name but are not S3 methods.
1. What does the `as.data.frame.data.frame()` method do? Why is
it confusing? How could you avoid this confusion in your own
code?
1. Describe the difference in behaviour in these two calls.
```{r}
set.seed(1014)
some_days <- as.Date("2017-01-31") + sample(10, 5)
mean(some_days)
mean(unclass(some_days))
```
1. What class of object does the following code return? What base type is it
built on? What attributes does it use?
```{r}
x <- ecdf(rpois(100, 10))
x
```
1. What class of object does the following code return? What base type is it
built on? What attributes does it use?
```{r}
x <- table(rpois(100, 5))
x
```
## Classes {#s3-classes}
\index{S3!classes}
\index{attributes!class}
\indexc{class()}
If you have done object-oriented programming in other languages, you may be surprised to learn that S3 has no formal definition of a class: to make an object an instance of a class, you simply set the __class attribute__. You can do that during creation with `structure()`, or after the fact with `class<-()`:
```{r}
# Create and assign class in one step
x <- structure(list(), class = "my_class")
# Create, then set class
x <- list()
class(x) <- "my_class"
```
You can determine the class of an S3 object with `class(x)`, and see if an object is an instance of a class using `inherits(x, "classname")`.
```{r}
class(x)
inherits(x, "my_class")
inherits(x, "your_class")
```
The class name can be any string, but I recommend using only letters and `_`. Avoid `.` because (as mentioned earlier) it can be confused with the `.` separator between a generic name and a class name. When using a class in a package, I recommend including the package name in the class name. That ensures you won't accidentally clash with a class defined by another package.
S3 has no checks for correctness which means you can change the class of existing objects:
```{r, error = TRUE}
# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
print(mod)
# Turn it into a date (?!)
class(mod) <- "Date"
# Unsurprisingly this doesn't work very well
print(mod)
```
If you've used other OO languages, this might make you feel queasy, but in practice this flexibility causes few problems. R doesn't stop you from shooting yourself in the foot, but as long as you don't aim the gun at your toes and pull the trigger, you won't have a problem.
To avoid foot-bullet intersections when creating your own class, I recommend that you usually provide three functions:
* A low-level __constructor__, `new_myclass()`, that efficiently creates new
objects with the correct structure.
* A __validator__, `validate_myclass()`, that performs more computationally
expensive checks to ensure that the object has correct values.
* A user-friendly __helper__, `myclass()`, that provides a convenient way for
others to create objects of your class.
You don't need a validator for very simple classes, and you can skip the helper if the class is for internal use only, but you should always provide a constructor.
### Constructors {#s3-constructor}
\index{S3!constructors}
\index{constructors!S3}
S3 doesn't provide a formal definition of a class, so it has no built-in way to ensure that all objects of a given class have the same structure (i.e. the same base type and the same attributes with the same types). Instead, you must enforce a consistent structure by using a __constructor__.
The constructor should follow three principles:
* Be called `new_myclass()`.
* Have one argument for the base object, and one for each attribute.
* Check the type of the base object and the types of each attribute.
I'll illustrate these ideas by creating constructors for base classes[^base-constructors] that you're already familiar with. To start, lets make a constructor for the simplest S3 class: `Date`. A `Date` is just a double with a single attribute: its `class` is "Date". This makes for a very simple constructor:
[^base-constructors]: Recent versions of R have `.Date()`, `.difftime()`, `.POSIXct()`, and `.POSIXlt()` constructors but they are internal, not well documented, and do not follow the principles that I recommend.
\indexc{Date}
```{r}
new_Date <- function(x = double()) {
stopifnot(is.double(x))
structure(x, class = "Date")
}
new_Date(c(-1, 0, 1))
```
The purpose of constructors is to help you, the developer. That means you can keep them simple, and you don't need to optimise error messages for public consumption. If you expect users to also create objects, you should create a friendly helper function, called `class_name()`, which I'll describe shortly.
A slightly more complicated constructor is that for `difftime`, which is used to represent time differences. It is again built on a double, but has a `units` attribute that must take one of a small set of values:
\indexc{difftime}
```{r}
new_difftime <- function(x = double(), units = "secs") {
stopifnot(is.double(x))
units <- match.arg(units, c("secs", "mins", "hours", "days", "weeks"))
structure(x,
class = "difftime",
units = units
)
}
new_difftime(c(1, 10, 3600), "secs")
new_difftime(52, "weeks")
```
The constructor is a developer function: it will be called in many places, by an experienced user. That means it's OK to trade a little safety in return for performance, and you should avoid potentially time-consuming checks in the constructor.
### Validators
\index{S3!validators}
\index{validators!S3}
More complicated classes require more complicated checks for validity. Take factors, for example. A constructor only checks that types are correct, making it possible to create malformed factors:
\indexc{factor}
```{r, error = TRUE}
new_factor <- function(x = integer(), levels = character()) {
stopifnot(is.integer(x))
stopifnot(is.character(levels))
structure(
x,
levels = levels,
class = "factor"
)
}
new_factor(1:5, "a")
new_factor(0:1, "a")
```
Rather than encumbering the constructor with complicated checks, it's better to put them in a separate function. Doing so allows you to cheaply create new objects when you know that the values are correct, and easily re-use the checks in other places.
```{r, error = TRUE}
validate_factor <- function(x) {
values <- unclass(x)
levels <- attr(x, "levels")
if (!all(!is.na(values) & values > 0)) {
stop(
"All `x` values must be non-missing and greater than zero",
call. = FALSE
)
}
if (length(levels) < max(values)) {
stop(
"There must be at least as many `levels` as possible values in `x`",
call. = FALSE
)
}
x
}
validate_factor(new_factor(1:5, "a"))
validate_factor(new_factor(0:1, "a"))
```
This validator function is called primarily for its side-effects (throwing an error if the object is invalid) so you'd expect it to invisibly return its primary input (as described in Section \@ref(invisible)). However, it's useful for validation methods to return visibly, as we'll see next.
### Helpers
\index{S3!helpers}
\index{helpers!S3}
If you want users to construct objects from your class, you should also provide a helper method that makes their life as easy as possible. A helper should always:
* Have the same name as the class, e.g. `myclass()`.
* Finish by calling the constructor, and the validator, if it exists.
* Create carefully crafted error messages tailored towards an end-user.
* Have a thoughtfully crafted user interface with carefully chosen default
values and useful conversions.
The last bullet is the trickiest, and it's hard to give general advice. However, there are three common patterns:
* Sometimes all the helper needs to do is coerce its inputs to the desired
type. For example, `new_difftime()` is very strict, and violates the usual
convention that you can use an integer vector wherever you can use a
double vector:
```{r, error = TRUE}
new_difftime(1:10)
```
It's not the job of the constructor to be flexible, so here we create
a helper that just coerces the input to a double.
```{r}
difftime <- function(x = double(), units = "secs") {
x <- as.double(x)
new_difftime(x, units = units)
}
difftime(1:10)
```
\indexc{difftime}
* Often, the most natural representation of a complex object is a string.
For example, it's very convenient to specify factors with a character
vector. The code below shows a simple version of `factor()`: it takes a
character vector, and guesses that the levels should be the unique values.
This is not always correct (since some levels might not be seen in the
data), but it's a useful default.
```{r, error = TRUE}
factor <- function(x = character(), levels = unique(x)) {
ind <- match(x, levels)
validate_factor(new_factor(ind, levels))
}
factor(c("a", "a", "b"))
```
\indexc{factor}
* Some complex objects are most naturally specified by multiple simple
components. For example, I think it's natural to construct a date-time
by supplying the individual components (year, month, day etc). That leads
me to this `POSIXct()` helper that resembles the existing `ISODatetime()`
function[^efficient]:
```{r}
POSIXct <- function(year = integer(),
month = integer(),
day = integer(),
hour = 0L,
minute = 0L,
sec = 0,
tzone = "") {
ISOdatetime(year, month, day, hour, minute, sec, tz = tzone)
}
POSIXct(2020, 1, 1, tzone = "America/New_York")
```
\indexc{POSIXct}
[^efficient]: This helper is not efficient: behind the scenes `ISODatetime()` works by pasting the components into a string and then using `strptime()`. A more efficient equivalent is available in `lubridate::make_datetime()`.
For more complicated classes, you should feel free to go beyond these patterns to make life as easy as possible for your users.
### Exercises
1. Write a constructor for `data.frame` objects. What base type is a data
frame built on? What attributes does it use? What are the restrictions
placed on the individual elements? What about the names?
1. Enhance my `factor()` helper to have better behaviour when one or
more `values` is not found in `levels`. What does `base::factor()` do
in this situation?
1. Carefully read the source code of `factor()`. What does it do that
my constructor does not?
1. Factors have an optional "contrasts" attribute. Read the help for `C()`,
and briefly describe the purpose of the attribute. What type should it
have? Rewrite the `new_factor()` constructor to include this attribute.
1. Read the documentation for `utils::as.roman()`. How would you write a
constructor for this class? Does it need a validator? What might a helper
do?
## Generics and methods {#s3-methods}
\indexc{UseMethod()}
\index{S3!generics}
\index{generics!S3}
The job of an S3 generic is to perform method dispatch, i.e. find the specific implementation for a class. Method dispatch is performed by `UseMethod()`, which every generic calls[^internal-generic]. `UseMethod()` takes two arguments: the name of the generic function (required), and the argument to use for method dispatch (optional). If you omit the second argument, it will dispatch based on the first argument, which is almost always what is desired.
[^internal-generic]: The exception is internal generics, which are implemented in C, and are the topic of Section \@ref(internal-generics).
Most generics are very simple, and consist of only a call to `UseMethod()`. Take `mean()` for example:
```{r}
mean
```
Creating your own generic is similarly simple:
```{r}
my_new_generic <- function(x) {
UseMethod("my_new_generic")
}
```
(If you wonder why we have to repeat `my_new_generic` twice, think back to Section \@ref(first-class-functions).)
You don't pass any of the arguments of the generic to `UseMethod()`; it uses deep magic to pass to the method automatically. The precise process is complicated and frequently surprising, so you should avoid doing any computation in a generic. To learn the full details, carefully read the Technical Details section in `?UseMethod`.
### Method dispatch
\index{S3!method dispatch}
\index{method dispatch!S3}
How does `UseMethod()` work? It basically creates a vector of method names, `paste0("generic", ".", c(class(x), "default"))`, and then looks for each potential method in turn. We can see this in action with `sloop::s3_dispatch()`. You give it a call to an S3 generic, and it lists all the possible methods. For example, what method is called when you print a `Date` object?
```{r}
x <- Sys.Date()
s3_dispatch(print(x))
```
The output here is simple:
* `=>` indicates the method that is called, here `print.Date()`
* `*` indicates a method that is defined, but not called, here `print.default()`.
The "default" class is a special __pseudo-class__. This is not a real class, but is included to make it possible to define a standard fallback that is found whenever a class-specific method is not available.
The essence of method dispatch is quite simple, but as the chapter proceeds you'll see it get progressively more complicated to encompass inheritance, base types, internal generics, and group generics. The code below shows a couple of more complicated cases which we'll come back to in Sections \@ref(inheritance) and \@ref(s3-dispatch).
```{r}
x <- matrix(1:10, nrow = 2)
s3_dispatch(mean(x))
s3_dispatch(sum(Sys.time()))
```
### Finding methods
\index{S3!methods!locating}
`sloop::s3_dispatch()` lets you find the specific method used for a single call. What if you want to find all methods defined for a generic or associated with a class? That's the job of `sloop::s3_methods_generic()` and `sloop::s3_methods_class()`:
```{r}
s3_methods_generic("mean")
s3_methods_class("ordered")
```
### Creating methods {#s3-arguments}
\index{S3!methods!creating}
\index{methods!S3}
There are two wrinkles to be aware of when you create a new method:
* First, you should only ever write a method if you own the generic or the
class. R will allow you to define a method even if you don't, but it is
exceedingly bad manners. Instead, work with the author of either the
generic or the class to add the method in their code.
* A method must have the same arguments as its generic. This is enforced in
packages by `R CMD check`, but it's good practice even if you're not
creating a package.
There is one exception to this rule: if the generic has `...`, the method
can contain a superset of the arguments. This allows methods to take
arbitrary additional arguments. The downside of using `...`, however, is
that any misspelled arguments will be silently swallowed[^ellipsis],
as mentioned in Section \@ref(fun-dot-dot-dot).
[^ellipsis]: See <https://github.com/hadley/ellipsis> for an experimental way of warning when methods fail to use all the arguments in `...`, providing a potential resolution of this issue.
### Exercises
1. Read the source code for `t()` and `t.test()` and confirm that
`t.test()` is an S3 generic and not an S3 method. What happens if
you create an object with class `test` and call `t()` with it? Why?
```{r, results = FALSE}
x <- structure(1:10, class = "test")
t(x)
```
1. What generics does the `table` class have methods for?
1. What generics does the `ecdf` class have methods for?
1. Which base generic has the greatest number of defined methods?
1. Carefully read the documentation for `UseMethod()` and explain why the
following code returns the results that it does. What two usual rules
of function evaluation does `UseMethod()` violate?
```{r}
g <- function(x) {
x <- 10
y <- 10
UseMethod("g")
}
g.default <- function(x) c(x = x, y = y)
x <- 1
y <- 1
g(x)
```
1. What are the arguments to `[`? Why is this a hard question to answer?
## Object styles
\index{S3!object styles}
So far I've focussed on vector style classes like `Date` and `factor`. These have the key property that `length(x)` represents the number of observations in the vector. There are three variants that do not have this property:
* Record style objects use a list of equal-length vectors to represent
individual components of the object. The best example of this is `POSIXlt`,
which underneath the hood is a list of 11 date-time components like year,
month, and day. Record style classes override `length()` and subsetting
methods to conceal this implementation detail.
```{r}
x <- as.POSIXlt(ISOdatetime(2020, 1, 1, 0, 0, 1:3))
x
length(x)
length(unclass(x))
x[[1]] # the first date time
unclass(x)[[1]] # the first component, the number of seconds
```
\indexc{POSIXlt}
* Data frames are similar to record style objects in that both use lists of
equal length vectors. However, data frames are conceptually two dimensional,
and the individual components are readily exposed to the user. The number of
observations is the number of rows, not the length:
```{r}
x <- data.frame(x = 1:100, y = 1:100)
length(x)
nrow(x)
```
\indexc{Date}
* Scalar objects typically use a list to represent a single thing.
For example, an `lm` object is a list of length 12 but it represents one
model.
```{r}
mod <- lm(mpg ~ wt, data = mtcars)
length(mod)
```
Scalar objects can also be built on top of functions, calls, and
environments[^s3-pairlist]. This is less generally useful, but you can see
applications in `stats::ecdf()`, R6 (Chapter \@ref(r6)), and
`rlang::quo()` (Chapter \@ref(quasiquotation)).
\indexc{lm()}
[^s3-pairlist]: You can also build an object on top of a pairlist, but I have yet to find a good reason to do so.
Unfortunately, describing the appropriate use of each of these object styles is beyond the scope of this book. However, you can learn more from the documentation of the vctrs package (<https://vctrs.r-lib.org>); the package also provides constructors and helpers that make implementation of the different styles easier.
### Exercises
1. Categorise the objects returned by `lm()`, `factor()`, `table()`,
`as.Date()`, `as.POSIXct()` `ecdf()`, `ordered()`, `I()` into the
styles described above.
1. What would a constructor function for `lm` objects, `new_lm()`, look like?
Use `?lm` and experimentation to figure out the required fields and their
types.
## Inheritance {#s3-inheritance}
\index{S3!inheritance}
\index{S3!methods!inheriting}
\index{inheritance!S3}
S3 classes can share behaviour through a mechanism called __inheritance__. Inheritance is powered by three ideas:
* The class can be a character _vector_. For example, the `ordered` and
`POSIXct` classes have two components in their class:
```{r}
class(ordered("x"))
class(Sys.time())
```
\indexc{POSIXct}
* If a method is not found for the class in the first element of the
vector, R looks for a method for the second class (and so on):
```{r}
s3_dispatch(print(ordered("x")))
s3_dispatch(print(Sys.time()))
```
* A method can delegate work by calling `NextMethod()`. We'll come back to
that very shortly; for now, note that `s3_dispatch()` reports delegation
with `->`.
```{r}
s3_dispatch(ordered("x")[1])
s3_dispatch(Sys.time()[1])
```
Before we continue we need a bit of vocabulary to describe the relationship between the classes that appear together in a class vector. We'll say that `ordered` is a __subclass__ of `factor` because it always appears before it in the class vector, and, conversely, we'll say `factor` is a __superclass__ of `ordered`.
S3 imposes no restrictions on the relationship between sub- and superclasses but your life will be easier if you impose some. I recommend that you adhere to two simple principles when creating a subclass:
* The base type of the subclass should be that same as the superclass.
* The attributes of the subclass should be a superset of the attributes
of the superclass.
`POSIXt` does not adhere to these principles because `POSIXct` has type double, and `POSIXlt` has type list. This means that `POSIXt` is not a superclass, and illustrates that it's quite possible to use the S3 inheritance system to implement other styles of code sharing (here `POSIXt` plays a role more like an interface), but you'll need to figure out safe conventions yourself.
\indexc{POSIXt}
### `NextMethod()`
\indexc{NextMethod()}
`NextMethod()` is the hardest part of inheritance to understand, so we'll start with a concrete example for the most common use case: `[`. We'll start by creating a simple toy class: a `secret` class that hides its output when printed:
```{r}
new_secret <- function(x = double()) {
stopifnot(is.double(x))
structure(x, class = "secret")
}
print.secret <- function(x, ...) {
print(strrep("x", nchar(x)))
invisible(x)
}
x <- new_secret(c(15, 1, 456))
x
```
This works, but the default `[` method doesn't preserve the class:
```{r}
s3_dispatch(x[1])
x[1]
```
To fix this, we need to provide a `[.secret` method. How could we implement this method? The naive approach won't work because we'll get stuck in an infinite loop:
```{r}
`[.secret` <- function(x, i) {
new_secret(x[i])
}
```
Instead, we need some way to call the underlying `[` code, i.e. the implementation that would get called if we didn't have a `[.secret` method. One approach would be to `unclass()` the object:
```{r}
`[.secret` <- function(x, i) {
x <- unclass(x)
new_secret(x[i])
}
x[1]
```
This works, but is inefficient because it creates a copy of `x`. A better approach is to use `NextMethod()`, which concisely solves the problem of delegating to the method that would have been called if `[.secret` didn't exist:
```{r}
`[.secret` <- function(x, i) {
new_secret(NextMethod())
}
x[1]
```
We can see what's going on with `sloop::s3_dispatch()`:
```{r}
s3_dispatch(x[1])
```
The `=>` indicates that `[.secret` is called, but that `NextMethod()` delegates work to the underlying internal `[` method, as shown by the `->`.
As with `UseMethod()`, the precise semantics of `NextMethod()` are complex. In particular, it tracks the list of potential next methods with a special variable, which means that modifying the object that's being dispatched upon will have no impact on which method gets called next.
### Allowing subclassing {#s3-subclassing}
\index{S3!subclassing}
When you create a class, you need to decide if you want to allow subclasses, because it requires some changes to the constructor and careful thought in your methods.
To allow subclasses, the parent constructor needs to have `...` and `class` arguments:
```{r}
new_secret <- function(x, ..., class = character()) {
stopifnot(is.double(x))
structure(
x,
...,
class = c(class, "secret")
)
}
```
Then the subclass constructor can just call to the parent class constructor with additional arguments as needed. For example, imagine we want to create a supersecret class which also hides the number of characters:
```{r}
new_supersecret <- function(x) {
new_secret(x, class = "supersecret")
}
print.supersecret <- function(x, ...) {
print(rep("xxxxx", length(x)))
invisible(x)
}
x2 <- new_supersecret(c(15, 1, 456))
x2
```
To allow inheritance, you also need to think carefully about your methods, as you can no longer use the constructor. If you do, the method will always return the same class, regardless of the input. This forces whoever makes a subclass to do a lot of extra work.
Concretely, this means we need to revise the `[.secret` method. Currently it always returns a `secret()`, even when given a supersecret:
```{r}
`[.secret` <- function(x, ...) {
new_secret(NextMethod())
}
x2[1:3]
```
\indexc{vec\_restore()}
We want to make sure that `[.secret` returns the same class as `x` even if it's a subclass. As far as I can tell, there is no way to solve this problem using base R alone. Instead, you'll need to use the vctrs package, which provides a solution in the form of the `vctrs::vec_restore()` generic. This generic takes two inputs: an object which has lost subclass information, and a template object to use for restoration.
Typically `vec_restore()` methods are quite simple: you just call the constructor with appropriate arguments:
```{r}
vec_restore.secret <- function(x, to, ...) new_secret(x)
vec_restore.supersecret <- function(x, to, ...) new_supersecret(x)
```
(If your class has attributes, you'll need to pass them from `to` into the constructor.)
Now we can use `vec_restore()` in the `[.secret` method:
```{r}
`[.secret` <- function(x, ...) {
vctrs::vec_restore(NextMethod(), x)
}
x2[1:3]
```
(I only fully understood this issue quite recently, so at time of writing it is not used in the tidyverse. Hopefully by the time you're reading this, it will have rolled out, making it much easier to (e.g.) subclass tibbles.)
If you build your class using the tools provided by the vctrs package, `[` will gain this behaviour automatically. You will only need to provide your own `[` method if you use attributes that depend on the data or want non-standard subsetting behaviour. See `?vctrs::new_vctr` for details.
### Exercises
1. How does `[.Date` support subclasses? How does it fail to support
subclasses?
1. R has two classes for representing date time data, `POSIXct` and
`POSIXlt`, which both inherit from `POSIXt`. Which generics have
different behaviours for the two classes? Which generics share the same
behaviour?
1. What do you expect this code to return? What does it actually return?
Why?
```{r, eval = FALSE}
generic2 <- function(x) UseMethod("generic2")
generic2.a1 <- function(x) "a1"
generic2.a2 <- function(x) "a2"
generic2.b <- function(x) {
class(x) <- "a1"
NextMethod()
}
generic2(structure(list(), class = c("b", "a2")))
```
## Dispatch details {#s3-dispatch}
\index{S3!method dispatch}
This chapter concludes with a few additional details about method dispatch. It is safe to skip these details if you're new to S3.
### S3 and base types {#implicit-class}
\index{implicit class}
\index{S3!implicit class}
What happens when you call an S3 generic with a base object, i.e. an object with no class? You might think it would dispatch on what `class()` returns:
```{r}
class(matrix(1:5))
```
But unfortunately dispatch actually occurs on the __implicit class__, which has three components:
* The string "array" or "matrix" if the object has dimensions
* The result of `typeof()` with a few minor tweaks
* The string "numeric" if object is "integer" or "double"
There is no base function that will compute the implicit class, but you can use `sloop::s3_class()`
```{r}
s3_class(matrix(1:5))
```
This is used by `s3_dispatch()`:
```{r}
s3_dispatch(print(matrix(1:5)))
```
This means that the `class()` of an object does not uniquely determine its dispatch:
```{r}
x1 <- 1:5
class(x1)
s3_dispatch(mean(x1))
x2 <- structure(x1, class = "integer")
class(x2)
s3_dispatch(mean(x2))
```
### Internal generics {#internal-generics}
\index{generics!internal}
Some base functions, like `[`, `sum()`, and `cbind()`, are called __internal generics__ because they don't call `UseMethod()` but instead call the C functions `DispatchGroup()` or `DispatchOrEval()`. `s3_dispatch()` shows internal generics by including the name of the generic followed by `(internal)`:
```{r}
s3_dispatch(Sys.time()[1])
```
For performance reasons, internal generics do not dispatch to methods unless the class attribute has been set, which means that internal generics do not use the implicit class. Again, if you're ever confused about method dispatch, you can rely on `s3_dispatch()`.
### Group generics
\index{S3!group generics}
\index{generics!group}
Group generics are the most complicated part of S3 method dispatch because they involve both `NextMethod()` and internal generics. Like internal generics, they only exist in base R, and you cannot define your own group generic.
There are four group generics:
* __Math__: `abs()`, `sign()`, `sqrt()`, `floor()`, `cos()`, `sin()`, `log()`,
and more (see `?Math` for the complete list).
* __Ops__: `+`, `-`, `*`, `/`, `^`, `%%`, `%/%`, `&`, `|`, `!`, `==`, `!=`, `<`,
`<=`, `>=`, and `>`.
* __Summary__: `all()`, `any()`, `sum()`, `prod()`, `min()`, `max()`, and
`range()`.
* __Complex__: `Arg()`, `Conj()`, `Im()`, `Mod()`, `Re()`.
Defining a single group generic for your class overrides the default behaviour for all of the members of the group. Methods for group generics are looked for only if the methods for the specific generic do not exist:
```{r}
s3_dispatch(sum(Sys.time()))
```
Most group generics involve a call to `NextMethod()`. For example, take `difftime()` objects. If you look at the method dispatch for `abs()`, you'll see there's a `Math` group generic defined.
```{r}
y <- as.difftime(10, units = "mins")
s3_dispatch(abs(y))
```
`Math.difftime` basically looks like this:
```{r}
Math.difftime <- function(x, ...) {
new_difftime(NextMethod(), units = attr(x, "units"))
}
```
It dispatches to the next method, here the internal default, to perform the actual computation, then restore the class and attributes. (To better support subclasses of `difftime` this would need to call `vec_restore()`, as described in Section \@ref(s3-subclassing).)
Inside a group generic function a special variable `.Generic` provides the actual generic function called. This can be useful when producing error messages, and can sometimes be useful if you need to manually re-call the generic with different arguments.
### Double dispatch
\index{double dispatch}
\index{method dispatch!S3!double dispatch}
Generics in the Ops group, which includes the two-argument arithmetic and Boolean operators like `-` and `&`, implement a special type of method dispatch. They dispatch on the type of _both_ of the arguments, which is called __double dispatch__. This is necessary to preserve the commutative property of many operators, i.e. `a + b` should equal `b + a`. Take the following simple example:
```{r}
date <- as.Date("2017-01-01")
integer <- 1L
date + integer
integer + date
```
If `+` dispatched only on the first argument, it would return different values for the two cases. To overcome this problem, generics in the Ops group use a slightly different strategy from usual. Rather than doing a single method dispatch, they do two, one for each input. There are three possible outcomes of this lookup:
* The methods are the same, so it doesn't matter which method is used.
* The methods are different, and R falls back to the internal method with
a warning.
* One method is internal, in which case R calls the other method.
This approach is error prone so if you want to implement robust double dispatch for algebraic operators, I recommend using the vctrs package. See `?vctrs::vec_arith` for details.
### Exercises
1. Explain the differences in dispatch below:
```{r}
length.integer <- function(x) 10
x1 <- 1:5
class(x1)
s3_dispatch(length(x1))
x2 <- structure(x1, class = "integer")
class(x2)
s3_dispatch(length(x2))
```
1. What classes have a method for the `Math` group generic in base R? Read
the source code. How do the methods work?
1. `Math.difftime()` is more complicated than I described. Why?