-
Notifications
You must be signed in to change notification settings - Fork 3
/
plotting_grab_bag_INCLASS.Rmd
182 lines (128 loc) · 5.63 KB
/
plotting_grab_bag_INCLASS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
```{r setup, include=FALSE}
# load libraries
library(tidyverse)
library(conflicted)
library(viridis)
library(magrittr)
library(cowplot)
# configure knit settings
knitr::opts_chunk$set(echo = TRUE, fig.width = 6, fig.height = 4)
# resolve package conflicts
filter <- dplyr::filter
select <- dplyr::select
```
# Plotting Grab Bag
For a relaxed meetup before Thanksgiving, we'll take a look at some useful plot code that's more specific for academic and biological representations.
## `cowplot`
<br>
`cowplot` goes on top of `ggplot` to create publication-ready plots. It has some minor tweaks to the default `ggplot` theme, but what most people use it for is to arrange plots in a grid.
<br>
If you haven't already, install `cowplot` by uncommenting the code in the chunk below and running its.
```{r}
#install.packages('cowplot')
```
<br>
**Make a bunch of plots** for making a grid in the next chunk.
```{r}
# sepal length vs sepal width
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3) +
labs(x = 'Sepal Length (cm)', y = 'Sepal Width (cm)') +
theme_classic() -> iris1
# sepal length vs petal length
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +
geom_point(size = 3) +
labs(x = 'Sepal Length (cm)', y = 'Petal Length (cm)') +
theme_classic() -> iris2
# sepal length vs petal width
ggplot(iris, aes(x = Sepal.Length, y = Petal.Width, color = Species)) +
geom_point(size = 3) +
labs(x = 'Sepal Length (cm)', y = 'Petal Width (cm)') +
theme_classic() -> iris3
```
<br>
**Plot together** using `cowplot::plot_grid()`
```{r}
# basic
plot_grid(iris1, iris2, iris3)
# add labels
plot_grid(iris1, iris2, iris3, labels = c('A', 'B', 'C'))
# specify number of columns/rows
plot_grid(iris1, iris2, iris3, labels = c('A', 'B', 'C'), ncol = 1)
plot_grid(iris1, iris2, iris3, labels = c('A', 'B', 'C'), nrow = 1)
```
<br><br>
## Plotting Differential Expression
We're going to practice plotting a volcano plot and an MA plot for differential expression.
<br>
First, read in a fake-ish differential expression table to practice plotting with. (Counts are from the gilad count table downloaded from the ReCount RNAseq website <http://bowtie-bio.sourceforge.net/recount/>)
```{r}
diff_exp <- read_tsv('demo_diff_exp_tbl.tsv')
```
<br>
### Volcano Plot
<br>
**Wrangle table before plotting**
A volcano plot has log2 fold change in RNA expression on the x axis vs. negative log10 p-values on the y-axis with points colored by significance of the p-value. This means we need three columns to plot:
1. log2 fold change in expression (logFC)
2. -log10 pvalues
3. a categorical column for coloring by significance
Log2 fold change is usually returned by differential expression packages and is in our table, the logFC column. However, we only have p-values in their normal small decimal format. They can be converted into -log10 inside of ggplot, but it makes plotting faster to add a column to the table first. Then we can add a categorical significance column for coloring by.
```{r}
diff_exp %>%
mutate(log_pvalue = -log10(PValue),
significant = ifelse(-log10(PValue) > -log10(0.05) & (logFC < -2 | logFC > 2),
'sig', 'notsig')) -> diff_exp_volcplot
```
<br>
**Plot a volcano plot**
```{r}
ggplot(diff_exp_volcplot, aes(x = logFC, y = log_pvalue)) +
geom_point(aes(color = significant)) +
scale_color_manual(name = '',
values = c('black', 'red'),
labels = c('not significant', 'significant (< 0.05)')) +
geom_hline(yintercept = -log10(0.05),
linetype = 'dashed',
color = 'grey40') +
geom_vline(xintercept = c(-2, 2),
linetype = 'dashed',
color = 'grey40') +
labs(x = 'Log2 Fold Change in Expression', y = '-Log10 P-Value') +
theme_classic() +
theme(axis.title = element_text(size = 14),
legend.text = element_text(size = 12))
```
<br><br>
### MA Plot
<br>
**Wrangle table before plotting**
An MA plot has average expression on the x axis and log fold change in expression on the y axis. This means the three columns we need to plot are:
1. mean expression
2. log2 fold change in expression (logFC)
3. a categorical column for coloring by significance
We already know from the volcano plot example that the table contains fold change in logFC and how to make a categorical significance column for coloring the points. Mean expression is also in our table, the logCPM column. LogCPM the is the log10 average mRNA count per million reads (basically logged average concentration). So we just need to add on the categorical significance column again.
```{r}
# tidy calculation
diff_exp %>%
mutate(significant = ifelse(-log10(PValue) > 2 & (logFC < -2 | logFC > 2),
'sig', 'notsig')) -> diff_exp_maplot
```
<br>
**Plot**
```{r}
ggplot(diff_exp_maplot, aes(x = logCPM, y = logFC)) +
geom_point(aes(color = significant)) +
scale_color_manual(values = c('black', 'red'),
labels = c('significant', 'not significant')) +
geom_hline(yintercept = c(-2, 0, 2),
linetype = c('dashed', 'solid', 'dashed'),
color = c('grey40', 'red', 'grey40')) +
labs(x = 'Mean Expression (log10 Counts per Million Reads)',
y = 'Log2 Fold Change in Expression') +
theme_classic() +
theme(legend.title = element_blank(),
axis.title = element_text(size = 14),
legend.text = element_text(size = 12))
```
<br><br>