-
Notifications
You must be signed in to change notification settings - Fork 105
/
pipes.qmd
executable file
·169 lines (127 loc) · 3.96 KB
/
pipes.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# Pipes
## Introduction
Use `|>` to emphasise a sequence of actions, rather than the object that the actions are being performed on.
The tidyverse has been designed to work particuarly well with the pipe, but you can use it with any code, particularly in conjunction with the `_` placeholder.
```{r}
strings |>
str_replace("a", "b"),
str_replace("x", "y")
strings |>
gsub("a", "b", x = _) |>
gsub("x", "y", x = _)
```
Avoid using the pipe when:
* You need to manipulate more than one object at a time. Reserve pipes for a
sequence of steps applied to one primary object.
* There are meaningful intermediate objects that could be given
informative names.
## Whitespace
`|>` should always have a space before it, and should usually be followed by a new line. After the first step, each line should be indented by two spaces. This structure makes it easier to add new steps (or rearrange existing steps) and harder to overlook a step.
```{r}
# Good
iris |>
summarize(across(where(is.numeric), mean), .by = Species) |>
pivot_longer(-Species, names_to = "measure", values_to = "value") |>
arrange(value)
# Bad
iris |> summarize(across(where(is.numeric), mean), .by = Species) |>
pivot_longer(-Species, names_to = "measure", values_to = "value") |>
arrange(value)
```
## Long lines
If the arguments to a function don't all fit on one line, put each argument on
its own line and indent:
```{r}
# Good
iris |>
summarise(
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width),
.by = Species
)
# Bad
iris |>
summarise(Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width), .by = Species)
```
For data analysis, we recommend using the pipe whenever a function needs to span multiple lines, even if it's only a single step.
```{r}
# Bad
summarise(
iris,
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width),
.by = Species
)
```
## Short pipes
It's ok to write a short pipe on a single line:
```{r}
# Ok
iris |> subset(Species == "virginica") |> _$Sepal.Length
iris |> summarise(width = Sepal.Width, .by = Species) |> arrange(width)
```
But because short pipes often become longer pipes, we recommend that you generally stick to one function per line:
```{r}
# Better
iris |>
subset(Species == "virginica") |>
_$Sepal.Length
iris |>
summarise(width = Sepal.Width, .by = Species) |>
arrange(width)
```
Sometimes it's useful to include a short pipe as an argument to a function in a
longer pipe. Carefully consider whether the code is more readable with a short
inline pipe (which doesn't require a lookup elsewhere) or if it's better to move
the code outside the pipe and give it an evocative name.
```{r}
# Good
x |>
semi_join(y |> filter(is_valid))
# Ok
x |>
select(a, b, w) |>
left_join(y |> select(a, b, v), join_by(a, b))
# Better
x_join <- x |> select(a, b, w)
y_join <- y |> select(a, b, v)
left_join(x_join, y_join, join_by(a, b))
```
## Assignment
There are three acceptable forms of assignment:
* Variable name and assignment on separate lines:
```{r}
iris_long <-
iris |>
gather(measure, value, -Species) |>
arrange(-value)
```
* Variable name and assignment on the same line:
```{r}
iris_long <- iris |>
gather(measure, value, -Species) |>
arrange(-value)
```
* Assignment at the end of the pipe with `->`:
```{r}
iris |>
gather(measure, value, -Species) |>
arrange(-value) ->
iris_long
```
I think that the third is the most natural to write, but makes reading a little
harder: when the name comes first, it can act as a heading to remind
you of the purpose of the pipe.
## magrittr
We recommend you use the base `|>` pipe instead of magrittr's `%>%`.
```{r}
# Good
iris |>
summarise(width = Sepal.Width, .by = Species) |>
arrange(width)
# Bad
iris %>%
summarise(width = Sepal.Width, .by = Species) %>%
arrange(width)
```
As of R 4.3.0, the base pipe provides all the features from magrittr that we recommend using.