forked from psych251/problem_sets
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ggplot2_exercise.Rmd
154 lines (104 loc) · 5.07 KB
/
ggplot2_exercise.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: 'ggplot2 intro exercise'
author: "Mike Frank"
date: "November 13, 2019"
output: html_document
---
This is an in-class exercise on using `ggplot2`.
Note, that this example is from the_grammar.R on [http://had.co.nz/ggplot2](). I've adapted this for psych 251 purposes
First install and load the package. It's part of the "core tidvyerse".
```{r}
library(tidyverse)
```
# Exploring ggplot2 using `qplot`
We'll start by using `qplot`. `qplot` is the easy interface, meant to replace `plot`. You can give it simple `qplot(x,y)` examples, or slightly more complex examples like `qplot(x, y, col=grp, data=d)`.
We're going to be using the `diamonds` dataset. This is a set of measurements of diamonds, along with their price etc.
```{r}
head(diamonds)
qplot(diamonds$carat, diamonds$price)
```
```{r}
qplot(diamonds$price)
```
Scatter plots are trivial, and easy to add features to. Modify this plot so that it uses the dataframe rather than working from variables in the general namespace (good to get away from retyping `diamonds$` every time you reference a variable).
```{r}
qplot(data = diamonds, price)
```
Try adding clarity and cut, using shape and color as your visual variables.
```{r}
qplot(data = diamonds,
x = carat, y = price, shape = clarity, color = cut)
```
# More complex with `ggplot`
`ggplot` is just a way of building `qplot` calls up more systematically. It's
sometimes easier to use and sometimes a bit more complicated. What I want to show off here is the functionality of being able to build up complex plots with multiple elements. You can actually do this using `qplot` pretty easily, but there are a few things that are hard to do.
`ggplot` is the basic call, where you specify A) a dataframe and B) an aesthetic (`aes`) mapping from variables in the plot space to variables in the dataset.
```{r}
p <- ggplot(diamonds, aes(x=carat, y=price)) # first you set the aesthetic and dataset
p <- p + geom_point() # then you add geoms
p <- p + geom_point(aes(colour = carat)) # and you can keep doing this to add layers to the plot
p
```
Try writing this as a single set of additions (e.g. one line of R code, though you can put in linebreaks). This is the most common workflow for me.
```{r}
ggplot(diamonds,
aes(x = carat, y = price)) +
geom_point(aes(colour = carat))
```
# Facets
Let's try out another version of the `qplot` example above. Rewrite the last qplot example with ggplot.
```{r}
ggplot(diamonds,
aes(x = carat, y = price,
shape = clarity, color = cut)) +
geom_point()
```
One of the primary benefits of `ggplot2` is the use of facets - also known as small multiples in the Tufte vocabulary. That last plot was probably hard to read. Facets could make it better. Try adding `facet_grid(x ~ y)`. `x ~ y` means row facets are by `x`, column facets by `y`.
```{r}
ggplot(diamonds,
aes(x = carat, y = price)) +
geom_point() +
facet_grid(clarity ~ cut)
```
But facets can also get overwhelming. Try to strike a good balance between color, shape, and faceting.
HINT: `facet_grid(. ~ x)` puts x on the columns, but `facet_wrap(~ x)` (no dot) *wraps* the facets.
```{r}
p <- ggplot(diamonds,
aes(x = carat, y = price)) +
geom_point(aes( color = clarity)) +
geom_smooth(method="lm") +
facet_wrap(~cut)
```
# Geoms
As you've seen above, the basic unit of a ggplot plot is a "geom" - a mapping between data (via an "aesthetic") and a particular geometric configuration on coordinate axes.
Let's try adding some geoms and manipulating their parameters. One combo I really like is a scatterplot with a smoothing layer (`geom_smooth`). Try adding one onto this plot.
```{r}
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point(shape = ".") +
facet_grid(cut ~ clarity)
```
CHALLENGE: You could also try starting with a histogram and adding densities. (`geom_density`), Try [this link](https://stackoverflow.com/questions/5688082/ggplot2-overlay-histogram-with-density-curve).
```{r}
```
# Themes and plot cleanup
I like a slightly cleaner look to my plots. Luckily, ggplot allows you to add "themes" to your plots. Try doing the same plot but adding `+ theme_bw()` or `+ theme_classic()`. Different themes work better for different applications, in my experience. My favorite right now is `ggthemes::theme_few`.
You can also try different color scales. Experiment with `scale_color_...` - try writing that and hitting TAB for autocomplete options. Check out the help on this.
You can also try transparency/different point sizes to clean up scatterplots. Try `alpha = .1` or `pch = "."` to make transparent or small points.
Finally, don't forget to "fix the axis labels"!
Here's a somewhat ugly plot - see if you can make it look awesome.
```{r}
ggplot(diamonds, aes(x = carat, y = price,
col = cut)) +
geom_point() +
facet_wrap(~clarity)
```
```{r}
ggplot(diamonds, aes(x = carat, y = price,
col = clarity)) +
geom_point(alpha = .1) +
facet_wrap(~cut) +
langcog::theme_mikabr() +
theme(legend.position = "bottom") +
ylab("Price ($)") +
xlab("Carat")
```