-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathimport_data_and_review.Rmd
238 lines (150 loc) · 4.56 KB
/
import_data_and_review.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
output: html_document
---
```{r setup, include=FALSE}
# load libraries
library(tidyverse)
library(conflicted)
library(viridis)
# configure knit settings
knitr::opts_chunk$set(echo = TRUE, fig.width = 6, fig.height = 4)
# resolve package conflicts
filter <- dplyr::filter
select <- dplyr::select
```
## import/export
### Working directory
#### `getwd()`
Ideally, you just want to have everything, all files and your Rmarkdown file/R scripts, in your working directory. To see where your working directory is, use `getwd()` with no arguments.
```{r}
getwd()
```
<br>
#### `setwd()`
However, if you need to set your working directory to somewhere other than where your Rmarkdown file/R script is (not recommended), use `setwd()` with the path to the directory you want to use. (**NOTE:** Any file path I put here will NOT work if you try running it on your computer. That's the downside of `setwd()`. Try setting your own working directory instead.)
```{r}
#setwd('/this_is_a_fake_path/to_show_function_syntax/')
```
<br>
#### `dir()`
You might also want to see what's in your working directory. To do that use `dir()`.
```{r}
dir()
```
<br>
## Reading and writing data tables with readr
### Read tables
#### `read_tsv()`
Reads in tab separated value (tsv) files
```{r}
wine <- read_tsv('./demo_files/wine.tsv')
```
<br>
#### `read_csv()`
Reads in comma separated value (csv) files
```{r}
sparrows <- read_csv('demo_files/sparrows.csv')
```
<br>
#### `read_delim()`
Reads in files. You have to specify what the file is delimited by.
```{r}
biopsy <- read_delim('demo_files/biopsy.txt', delim = ' ')
```
<br>
#### Problems
Sometimes there aren't column names, or the column names are in a file header or are not read in properly. This example has column names in a commented header.
```{r}
rowan <- read_csv('demo_files/rowan.csv')
```
<br>
We can solve the problem by specifying the column names and that the header is a comment or skipping the first line will do the same thing.
```{r}
rowan <- read_csv('demo_files/rowan.csv',
col_names = c('altitude', 'resp.rate', 'species', 'leaf.len', 'nesting'),
comment = '#')
rowan <- read_csv('demo_files/rowan.csv',
col_names = c('altitude', 'resp.rate', 'species', 'leaf.len', 'nesting'),
skip = 1)
```
`read_*()` functions guess data types from the first 1,000 rows. Guess where this always fails? Chromosomes!
<br>
### Write tables
When you want to save a data table that you've made inside R, you have to write the table.
<br>
#### `write_tsv()`
This will save your table with tabs to delimit the data.
```{r}
# change something about biopsy
wine %>% select(Cultivar, Color) -> wine_cult_col
# save it as a tsv
write_tsv(wine_cult_col, 'cultivar_color.tsv')
```
<br>
#### `write_csv()`
This will save your table with commas to delimit the data.
```{r}
wine %>% select(Cultivar, Color) %>% write_csv('cultivar_color.csv')
```
<br>
#### `write_delim()`
You can specify what you want as a delimiter using `write_delim()`
```{r}
sparrows %>% group_by(Sex, Age, Survival) %>% count() -> sparrow_survival
write_delim(sparrow_survival, 'sparrow_survival.tsv', delim = '\t')
```
<br>
## Saving plots
#### base R
```{r}
png('sparrows_weight_by_age_sex.png')
ggplot(sparrows, aes(x = Age, y = Weight, fill = Age)) +
geom_violin(alpha = 0.8) +
scale_fill_manual(values = c('darkcyan', 'hotpink')) +
facet_wrap(~ Sex) +
theme_classic() +
theme(legend.position = 'none')
dev.off()
```
<br>
#### `ggsave()`
ggplot has its own way of saving plots, `ggsave()`. It will automatically save the last plot run in memory, or you can specify what plot to save. It will also autodetect the image filetype from the extension given in the filename you give it.
```{r}
# automatic
ggplot(biopsy, aes(x = marg_adhesion, fill = outcome )) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c('darkgray', 'firebrick')) +
labs(x = 'margin adhesion') +
theme_classic()
ggsave('biopsy_margins.png')
```
<br>
If you save the plot as an object, you can tell `ggsave()` what plot object you want to save.
```{r}
# specify saved plot
ggplot(wine, aes(x = Alcohol, y = Ash, color = as.factor(Cultivar))) +
geom_point() +
scale_color_viridis(discrete = T) +
labs(color = 'Cultivar') +
theme_classic() -> wine_plot
ggsave('wine_cult.png', plot = wine_plot)
```
<br>
## Work on data together
### Import
```{r}
biopsy <- read_tsv('demo_files/biopsy_inclass_demo.tsv')
```
<br>
### Tidy
```{r}
```
<br>
### Visualize
```{r}
```
<br>
### Test
```{r}
```
<br><br>