-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfunctions.qmd
255 lines (173 loc) · 3.49 KB
/
functions.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
---
title: "Functions"
---
## Libraries
```{r}
#| label: setup
#| message: false
#| warning: false
library(tidyverse)
library(janitor)
```
## The data
We'll mostly use the starwars data already in tidyverse.
But sometimes not.
## read
https://bit.ly/jedr-rebels
- clean names
- assign to an object
[read_csv](https://readr.tidyverse.org/reference/read_delim.html) and the other readr functions work similarly.
```{r}
rebels <-
read_csv("https://bit.ly/jedr-rebels") |>
clean_names()
```
## glimpse
Lets you peek at all the data variables at once.
```{r}
starwars |> glimpse()
```
Can also be ...
```{r}
glimpse(starwars)
```
## summary
```{r}
summary(rebels)
```
## select
This function selects specific columns or variables.
```{r}
starwars |>
select(name, height)
```
## mutate
### Centemeters to inches
multiply by 2.54
```{r}
starwars |>
mutate(
height_in = height * 2.54,
.after = height
)
```
### Convert a date
Uses dmy (day-month-year or whatever the order) from [lubridate](https://lubridate.tidyverse.org/) to make a real data.
```{r}
rebels |>
mutate(
air_date = dmy(original_air_date),
.after = original_air_date
)
```
## write
## filter
filter() returns only rows that meet logical criteria you specify.
```{r}
starwars |>
filter(species == "Human")
```
```{r}
starwars |>
filter(height > 200)
```
```{r}
starwars |>
filter(str_detect(hair_color, "auburn"))
```
## summarize
summarize() builds a summary table about your data. You can count rows n() or do math on numerical values, like mean(). In the next chapter we will summarize with math functions.
```{r}
starwars |>
summarise(
avg_height = mean(height, na.rm = TRUE)
)
```
## group_by with sum, n
group_by() is often used with summarize() to put data into groups before building a summary table based on the groups.
```{r}
starwars |>
group_by(species) |>
summarise(
avg_height = mean(height, na.rm = TRUE) |> round_half_up(),
numb_chars = n()
)
```
### Rounding test
```{r}
round(11.5)
round(12.5)
round_half_up(11.5)
round_half_up(12.5)
```
## distinct
distinct() returns rows based on unique values in columns you specify. i.e., it deduplicates data.
```{r}
starwars |>
distinct(hair_color, eye_color)
```
## slice
and variates _sample, _max, _min.
```{r}
starwars |>
slice_sample(n = 4)
```
```{r}
starwars |>
select(name, species, mass) |>
group_by(species) |>
slice_max(mass)
```
## case categories
Creating simplified variables using case_ functions as tests.
### case_match
When you look at a specific column/variable and make changes based on those values.
```{r}
starwars |>
mutate(
species_simple = case_match(
species,
"Human" ~ "Human",
.default = "Other"
)
) |>
select(name, starts_with("species"))
```
### case_when
When you need more complicated logic that might look at more than one column (though this doesn't.)
```{r}
starwars |>
mutate(
species_simple = case_when(
species == "Human" ~ "Human",
species == "Droid" ~ "Droid",
.default = "Other"
)
) |>
select(name, starts_with("species"))
```
## c
```{r}
species_short <-
c(
"Human",
"Droid"
)
starwars |>
filter(species %in% species_short) |>
select(name, species)
```
## nrow
```{r}
starwars |> nrow()
```
---
> at this point I dunno if we do in class anymore?
## ggplot
## reorder
## pivot
pivot_wider and pivot_longer?
```{r}
starwars |>
select(name, films) |> unnest(films)
```