forked from swcarpentry/r-novice-gapminder
-
Notifications
You must be signed in to change notification settings - Fork 0
/
09-vectorisation.Rmd
248 lines (214 loc) · 4.99 KB
/
09-vectorisation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
---
layout: page
title: R for reproducible scientific analysis
subtitle: Vectorisation
minutes: 30
---
```{r, include=FALSE}
source("tools/chunk-options.R")
opts_chunk$set(fig.path = "fig/09-vectorisation-")
# Silently load in the data so the rest of the lesson works
gapminder <- read.csv("data/gapminder-FiveYearData.csv", header=TRUE)
library(ggplot2)
```
> ## Learning Objectives {.objectives}
>
> * To understand vectorised operations in R.
>
Most of R's functions are vectorised, meaning that the function will
operate on all elements of a vector without needing to loop through
and act on each element one at a time. This makes writing code more
concise, easy to read, and less error prone.
```{r}
x <- 1:4
x * 2
```
The multiplication happened to each element of the vector.
We can also add two vectors together:
```{r}
y <- 6:9
x + y
```
Each element of `x` was added to its corresponding element of `y`:
```{r, eval=FALSE}
x: 1 2 3 4
+ + + +
y: 6 7 8 9
---------------
7 9 11 13
```
> ## Challenge 1 {.challenge}
>
> Let's try this on the `pop` column of the `gapminder` dataset.
>
> Make a new column in the `gapminder` data frame that
> contains population in units of millions of people.
> Check the head or tail of the data frame to make sure
> it worked.
>
> ## Challenge 2 {.challenge}
>
> On a single graph, plot population, in
> millions, against year, for all countries. Don't worry about
>identifying which country is which.
>
> Repeat the exercise, graphing only for China, India, and
>Indonesia. Again, don't worry about which is which.
>
Comparison operators, logical operators, and many functions are also
vectorized:
**Comparison operators**
```{r}
x > 2
```
**Logical operators**
```{r}
a <- x > 3 # or, for clarity, a <- (x > 3)
a
```
> ## Tip: some useful functions for logical vectors {.callout}
>
> `any()` will return `TRUE` if *any* element of a vector is `TRUE`
> `all()` will return `TRUE` if *all* elements of a vector are `TRUE`
>
Most functions also operate element-wise on vectors:
**Functions**
```{r}
x <- 1:4
log(x)
```
Vectorised operations work element-wise on matrices:
```{r}
m <- matrix(1:12, nrow=3, ncol=4)
m * -1
```
> ## Tip: element-wise vs. matrix multiplication {.callout}
>
> Very important: the operator `*` gives you element-wise multiplication!
> To do matrix multiplication, we need to use the `%*%` operator:
>
> ```{r}
> m %*% matrix(1, nrow=4, ncol=1)
> matrix(1:4, nrow=1) %*% matrix(1:4, ncol=1)
> ```
>
> For more on matrix algebra, see the [Quick-R reference
> guide](http://www.statmethods.net/advstats/matrix.html)
> ## Challenge 3 {.challenge}
>
> Given the following matrix:
>
> ```{r}
> m <- matrix(1:12, nrow=3, ncol=4)
> m
> ```
>
> Write down what you think will happen when you run:
>
> 1. `m ^ -1`
> 2. `m * c(1, 0, -1)`
> 3. `m > c(0, 20)`
> 4. `m * c(1, 0, -1, 2)`
>
> Did you get the output you expected? If not, ask a helper!
>
> ## Challenge 4 {.challenge}
>
> We're interested in looking at the sum of the
> following sequence of fractions:
>
> ```{r, eval=FALSE}
> x = 1/(1^2) + 1/(2^2) + 1/(3^2) + ... + 1/(n^2)
> ```
>
> This would be tedious to type out, and impossible for high values of
> n. Use vectorisation to compute x when n=100. What is the sum when
> n=10,000?
## Challenge solutions
> ## Solution to challenge 1 {.challenge}
>
> Let's try this on the `pop` column of the `gapminder` dataset.
>
> Make a new column in the `gapminder` data frame that
> contains population in units of millions of people.
> Check the head or tail of the data frame to make sure
> it worked.
>
> ```{r}
> gapminder$pop_millions <- gapminder$pop / 1e6
> head(gapminder)
> ```
>
> ## Solution to challenge 2 {.challenge}
>
> Refresh your plotting skills by plotting population in millions against year.
>
> ```{r ch2-sol}
> plot(gapminder$year, gapminder$pop_millions)
> countryset <- c('China', 'India', 'Indonesia')
> y <- gapminder[gapminder$country %in% countryset, ]
> plot(y$year, y$pop_millions)
> ```
>
> ## Solution to challenge 3 {.challenge}
>
> Given the following matrix:
>
> ```{r}
> m <- matrix(1:12, nrow=3, ncol=4)
> m
> ```
>
>
> Write down what you think will happen when you run:
>
> 1. `m ^ -1`
>
> ```{r, echo=FALSE}
> m ^ -1
> ```
>
> 2. `m * c(1, 0, -1)`
>
> ```{r, echo=FALSE}
> m * c(1, 0, -1)
> ```
>
> 3. `m > c(0, 20)`
>
> ```{r, echo=FALSE}
> m > c(0, 20)
> ```
>
> ## Challenge 4 {.challenge}
>
> We're interested in looking at the sum of the
> following sequence of fractions:
>
> ```{r, eval=FALSE}
> x = 1/(1^2) + 1/(2^2) + 1/(3^2) + ... + 1/(n^2)
> ```
>
> This would be tedious to type out, and impossible for
> high values of n.
> Can you use vectorisation to compute x, when n=100?
> How about when n=10,000?
>
> ```{r}
> sum(1/(1:100)^2)
> sum(1/(1:1e04)^2)
> n <- 10000
> sum(1/(1:n)^2)
> ```
>
> We can also obtain the same results using a function:
> ```{r}
> inverse_sum_of_squares <- function(n) {
> sum(1/(1:n)^2)
> }
> inverse_sum_of_squares(100)
> inverse_sum_of_squares(10000)
> n <- 10000
> inverse_sum_of_squares(n)
> ```
>