-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chapter_1_2_3.Rmd
309 lines (204 loc) · 7.24 KB
/
Chapter_1_2_3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
---
title: "Notes for Chapters 1 to 3"
author: "Laura"
date: "7/20/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Chapter 1: Installing the packages that will be needed
```{r packages, message=FALSE, warning=FALSE}
library(tidyverse)
```
## Chapter 2: Introduction
## Chapter 3: Data visualisation
* Cool additional reading: [The Layered Grammar of Graphics](http://vita.had.co.nz/papers/layered-grammar.pdf).
* `ggplot()` creates a coordinate system to which layers are added.
* the geaom function in ggplot2 takes a ` mapping` argument. This defines how variables in your dataset are mapped to visual properties.
```{r ggplot1, message=FALSE, warning=FALSE}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
### 3.2.4 Excercises
```{r ex3241}
ggplot(data = mpg)
mpg
?mpg
ggplot(mpg) + geom_point(mapping = aes(y = hwy, x = cyl))
ggplot(mpg) + geom_point(mapping = aes(y = class, x = drv))
```
The plot of class vs drv is not useful because all observations are one on top of another.
### 3.3 Aesthetic mappings
An aesthetic is a visual property of the objects in your plot. It can be the location x and y of a point, a color, a shape, etc. To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variables within `aes()`. ggplot2 automatically assigns a unique level of the aesthetic (e.g. a unique color), to each unique value of the variable, a process called _scaling_.
```{r color}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
```{r size}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
```
ggplot2 only uses six shapes at a time. Addiontional groups will go unplotted when you use the shape aesthetic. This happens with SUVs here. And you get a neat warning about it.
```{r shape}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
```
```{r ggplot33}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
```
### 3.3.1 Exercises
```{r ex331}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = 1:234))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = "blue"))
mpg
```
1. The points are not blue because color is within aes() and aes is trying to find the variable blue to apply different colors to the values of that variable. However, this variable is passed as a string and, hence, the point gets the first color value that they would get if such a variable existed. In this case the color is pinkish.
```{r ex332}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = displ))
# ggplot(data = mpg) +
# geom_point(mapping = aes(x = displ, y = hwy, shape = displ))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = trans))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = trans))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = trans))
```
```{r ex333}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ, size = displ))
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(aes(linetype = drv))
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(shape = 21, size = 3, stroke = 3, fill = "grey", show.legend = FALSE) +
geom_smooth(aes(linetype = drv, color = drv), size = 3, se = FALSE, show.legend = FALSE)
?geom_point
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))
```
### 3.5 Facets
```{r facets}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
```
```{r facets1}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
```
```{r facets2}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ class)
```
### 3.5.1 Exercises
```{r ex351}
ggplot(data = mpg) +
geom_point(mapping = aes(x = cyl, y = hwy)) +
facet_grid(. ~ displ)
```
```{r ex3512}
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
```
```{r ex353}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
```
### 3.6 Geometric objects
A geom is the geometrical object that a plot uses to represent data. Every geom function in ggplot2 takes a mapping argument. Not every aesthetic works with every geom.
### 3.6.1 Exercises
```{r ex3612}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
```
```{r ex3613}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
```
```{r ex3616}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(size = 3) +
geom_smooth(se = FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(size = 3) +
geom_smooth(aes(group = drv), se = FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point(size = 5) +
geom_smooth(se = FALSE)
```
### 3.7 Statistical Transformations
```{r transf}
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
?geom_bar
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
```
The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation.
### 3.7.1 Exercises
```{r ex371}
?geom_boxplot
?geom_errorbar
?geom_col
?stat_smooth
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop..))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = 0.5))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = 2))
```
*Q* waht is the default geom associated with ` stat_summary()`? and exercise 3 y que onda el group = 1? y cuando agrego fill = color?
### 3.8.1 Exercises
me los saltie
### 3.9 Coordinate systems
### 3.9.1 Exercises
```{r}
ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point() +
geom_abline() +
coord_fixed() # hace que se mantenga la escala en ambos ejes igual
```
```{r}
ggplot(mpg, aes(fill = drv)) +
geom_bar(aes(factor(1)))
ggplot(mpg, aes(x = factor(1), fill = drv)) +
geom_bar() +
coord_polar(theta = "y")
```
### Primer Quiz
```{r}
ggplot(diamonds, aes(carat, price)) +
geom_point() +
geom_smooth(aes(color = cut), se = FALSE) +
theme_light() +
labs(title = "Ideal cut diamonds command the best price for every carat size",
subtitle = "Lines show GAM estimate of mean values for each level of cut",
caption = "Data provided by Hadley Wickham",
x = "Log Carat Size",
y = "Log Price Size") +
scale_color_brewer(palette = "Greens", name = "Cut Rating", label = c("Fair", "Good", "Very Good", "Premium", "Ideal")) +
scale_x_log10() +
scale_y_log10()
```