-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy paththesis-publication.Rmd
569 lines (451 loc) · 21.1 KB
/
thesis-publication.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
---
title: "Assessing Phosphorus Sources with Synoptic Sampling in the Surface Waters of a Mixed-Use, Montane Watershed"
author: "Austin W. Pearce"
date: "Documentation last updated on September 18, 2017"
output:
tufte::tufte_handout
---
# Intro
I consider myself somewhat new to R. I've only been coding for a year. And most of the coding I know is for basic statistics and making simple data visualizations. However, in an effort to better document my thesis data for the next person who might find it, below is some documentation and code for the synoptic sampling data I collected while a master's student at BYU from 2015 - 2017 with Dr. Neil Hansen.
This notebook differs from the other (`thesis.Rmd`) in that graphics here have been revised to reflect simpler, cleaner styles.
# Setup
Most of the packages required for all code below can be installed in one fell swoop by installing the `tidyverse` package. GIS shapefiles were imported using `rgdal`. The `viridis` package provides pleasing color palettes that are also colorblind friendly. `gridExtra` allows multiple ggplots to be arranged side-by-side.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
message = FALSE,
warning = FALSE)
library(lubridate)
library(tufte)
library(tidyverse)
library(rgdal)
library(gridExtra)
library(cowplot)
theme_set(
theme_bw(base_family = "serif",
base_size = 14) +
theme(plot.background = NULL,
plot.margin = margin(5, 5, 0, 0, "pt"),
panel.grid = element_blank(),
axis.line = element_line(color = "black"),
axis.text = element_text(color = "black"),
axis.ticks = element_line(color = "black"),
legend.title.align = 0.5,
legend.key.height = unit(x = 2, units = "pt"),
legend.justification=c(1,1), legend.position=c(1,1),
legend.background = element_blank()
)
)
lab.lps <- expression('Streamflow (L s'^-1*')')
```
# The Watershed Map
To effectively orient the reader who views these spatial data, below is code to produce a basemap of the Wallsburg watershed. This basemap is simplified to only include the watershed boundary and the three major tributaries. Values for *Latitude* and *Longitude* are not included on the axis to maintain simplicity, and because the coordinates of the sites do not really matter in the context of this study.
*Note: streams were read in as separate shapefiles because I had issues plotting when all streams were in one shapefile.*
```{r build map, include=FALSE}
# Shapefiles are in WGS 84 lat-long projection
# Main Creek (mc), Spring Creek (sc), and so on
bound <- readOGR('input/shapefiles', 'boundary')
mc <- readOGR('input/shapefiles', 'maincreek')
sc <- readOGR('input/shapefiles', 'springcreek')
lhc <- readOGR('input/shapefiles', 'littlehobble')
lhclf <- readOGR('input/shapefiles', 'littlehobble2')
# build ggplot object for outline of watershed
bound <- geom_polygon(data = bound, aes(x = long, y = lat),
col = "gray50", fill = NA)
# build ggplot objects for the streams
sc <- geom_path(data = sc, aes(long, lat), col = "#cccccc")
mc <- geom_path(data = mc, aes(long, lat), col = "#cccccc")
lhc <- geom_path(data = lhc, aes(long, lat), col = "#cccccc")
lhclf <- geom_path(data = lhclf, aes(long, lat), col = "#cccccc")
# A ggplot basemap of Wallsburg watershed with streams without boundary
map.wallsburg <- ggplot() +
sc + mc + lhc + lhclf +
coord_quickmap() +
labs(x = NULL,
y = NULL) +
theme_tufte() +
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_text(size = 10),
panel.grid = element_blank(),
legend.text = element_text(size = 10),
strip.text = element_text(size = 10),
legend.position = 'top')
#remove other objects in order to maintain neat R environment
remove(bound, lhc, lhclf, mc, sc)
# Save as RDS for quicker loading next time.
write_rds(map.wallsburg, path = 'input/wallsburg_map_tufte.rds')
```
If the user wants, the previous code can be skipped once the RDS file is created. In this way, the user can skip reading and plotting the shapefiles each time the notebook is run. But it's optional given that the user may want to reproduce the map each time to ensure accuracy in the code.
```{r display-map, fig.width=4, fig.height=5}
map.wallsburg = readRDS('input/wallsburg_map_tufte.rds')
map.wallsburg
```
# Historic Discharge at Outlet
The historical data on the discharge of the Wallsburg Watershed shows clearly the annual flux of water from snowmelt, and helped to determine the sampling dates. Below is the code to generate that plot.
```{r historic-annual-hydrograph, include = FALSE}
# Get data
# separate data for month averaging
outlet <- read_csv('input/outlet.csv') %>%
dplyr::filter(. , TP < 1)
outlet$DOY <- lubridate::yday(outlet$DATE)
outlet_dates <- separate(outlet, col = DATE, into = c("Y", "M", "D"))
outlet_dates$M <- month.abb[as.numeric(outlet_dates$M)]
# not necessary but helpful
outlet_dates$LITERS <- outlet_dates$CFS*28.3168 # convert cfs to L/s
#reorder month names from alphabetic
outlet_dates$M <- factor(outlet_dates$M, levels = month.abb)
```
```{r median-flow-annual}
# median flows for each month from 1984 - 2014
# title = "Median Monthly Discharge for 30 Years: 1984 - 2014"
flow.med <- aggregate(LITERS ~ M, data = outlet_dates, median)
flow.median <- ggplot(data = flow.med,
aes(x = M,
y = LITERS,
group = 1)) +
geom_line() +
geom_point(size = 2) +
scale_y_continuous(breaks = seq(0, 2000, 250)) +
labs(x = "Month", y = expression('Median Discharge (L s'^-1*')'))
flow.median
ggsave('output/pub/median-discharge.png',
width = 7, height = 3, dpi = 1200)
```
Some looking at this might note that the volume of water is relatively little; this is a stream in the western United States, after all, a semi-arid region.
# Streamflow
The streamflow data for this analysis can be found in two CSV files. The `master.csv` file contains the streamflow data as measured in the seven sampling campaigns. The `flowmeans.csv` contains the data as averaged between the two years, such that the *Peak* streamflow value for MC01, for example, is the average streamflow at MC01 based on the measurement from May 2015 and May 2016. In a way, this averaging helps to smooth out year-to-year variability and simplify analysis. More on data collection can be read in the thesis document `thesis.pdf`.
```{r flow-data, include=FALSE}
# All streamflow (flow) data
flow = read_csv("input/master.csv", na = 'no_access') %>%
filter(., site != 'MC04') %>%
select(., id:ls,-cfs) %>%
drop_na(.)
# streamflow data simplified into the three hydrologic periods:
# 'Rise', 'Peak', & 'Base'
flow_means = read_csv("input/flowmeans.csv")
# This variable reorders the three hydrologic periods into a chronological
# order in terms of the annual hydrograph of the watershed
flow_means$hydros = factor(flow_means$hydro,
levels = c('Rise', 'Peak', 'Base'))
# the next data frame provides only the dry sites, which is important to show
# during the baseflow 2015 campaign
flow_dry = filter(flow, ls == 0)
flow_201507 = filter(flow, date == '2015-07-29')
```
As described in the thesis section on traditional synoptic sampling approach, the baseflow sampling campaign on 2015 provided certain insights into the P loading patterns of the watershed. The streamflows are as follows in the plot below, starting with the baseflow streamflow of 2015 and then the streamflows in each of the averaged hydrologic periods.
```{r baseflow-2015, fig.width=4, fig.height=5}
# Baseflow 2015
plot.flow201507 = map.wallsburg +
geom_point(data = flow_201507,
aes(x = long, y = lat, size = ls), # fill inside of point
pch = 16, alpha = 0.1, col = 'black') +
geom_point(data = flow_201507,
aes(x = long, y = lat, size = ls),
col = 'black', pch = 21) +
geom_point(data = flow_dry,
aes(x = long, y = lat, shape = as.factor(ls)),
size = 2) +
scale_size_area(name = 'Streamflow\n(L/s)', max_size = 12) +
scale_shape_discrete(name = '', labels = c('Dry'))
plot.flow201507
ggsave('output/pub/base-flow.png',
width = 5, height = 7, dpi = 1200)
```
```{r flow-over-time}
# bubble plot for each sampling campaign
plt.flow = map.wallsburg +
geom_point(data = flow_means,
aes(x = long, y = lat, size = ls), # fill
pch = 16, alpha = 0.1, col = "black") +
geom_point(data = flow_means,
aes(x = long, y = lat, size = ls),
col = 'black', pch = 21) +
facet_wrap(~ hydros,
labeller = labeller(.multi_line = FALSE)) +
scale_size_area(breaks = c(0, 50, 100, 200, 300),
max_size = 12, name = "Streamlow (L/s)") +
scale_shape_discrete(name = '',
labels = c('No Access'))
plt.flow
ggsave('output/tufte/all-flow.png',
width = 12, height = 6, dpi = 1200)
```
# Concentrations
The concentration data for this analysis can be found in two CSV files. The `master.csv` file contains the concentration data as measured in the seven sampling campaigns. The `concmeans.csv` contains the data as averaged between the two years, such that the *Peak* concentration value for MC01, for example, is the average concentration at MC01 based on the measurement from May 2015 and May 2016. In a way, this averaging helps to smooth out year-to-year variability and simplify analysis. More on data collection can be read in the thesis document `thesis.pdf`.
```{r concentration-data, include=FALSE}
conc = read_csv('input/master.csv', na = c('dry', 'no_access', 'lost')) %>%
filter(., site != 'MC04') %>%
select(., id:dop,-cfs,-ls) %>%
gather(., 'fraction', 'conc', 10:14) %>%
drop_na(.)
conc_mean = read_csv('input/concmeans.csv', na = 'dry') %>%
filter(., site != 'MC04') %>%
gather(., 'fraction', 'conc', 8:12) %>%
drop_na(.)
conc_mean_dry = read_csv('input/concmeans.csv') %>%
filter(., site != 'MC04') %>%
gather(., 'fraction', 'conc', 8:12) %>%
dplyr::filter(., conc == 'dry')
# Create variable that helps reorder P Fractions
conc_mean$fractions = factor(conc_mean$fraction,
levels = c('TP','PP','TDP','DRP','DOP'))
conc_mean$hydros = factor(conc_mean$hydro,
levels = c('Rise', 'Peak', 'Base'))
conc_mean_tp = filter(conc_mean, fraction == 'TP')
```
The concentrations are as follows in the plots below, starting with the baseflow streamflow of 2015, then the concentrations in each of the averaged hydrologic periods, and then the concentrations of each fraction in each hydrologic period.
```{r concentrations}
conc_base = filter(conc, date == '2015-07-29')
conc_base_tp = filter(conc_base, fraction == 'tp')
conc_dry = read_csv('input/concmeans.csv') %>%
filter(., site != 'MC04') %>%
gather(., 'fraction', 'conc', 8:12) %>%
filter(., conc == 'dry')
# TP Only
plt.base_conc = map.wallsburg +
geom_point(data = conc_base_tp,
aes(long, lat, size = conc),
pch = 21, col = 'black') +
geom_point(data = conc_base_tp,
aes(long, lat, size = conc),
pch = 16, col = "black", alpha = 0.1) +
geom_point(data = flow_dry,
aes(x = long, y = lat, shape = as.factor(ls)),
size = 2) +
scale_size_area(breaks = c(0, 0.05, 0.1, 0.25),
max_size = 10,
name = 'Total P\n(mg/L)') +
scale_shape_discrete(name = '', labels = c('Dry'))
plt.base_conc
ggsave('output/tufte/base-conc.png',
width = 5, height = 7, dpi = 1200)
```
```{r mean-TP}
# Mean TP overtime
plt.conc_tp = map.wallsburg +
geom_point(data = conc_mean_tp,
aes(long, lat, size = conc),
col = 'black', pch = 21) +
geom_point(data = conc_mean_tp,
aes(long, lat, size = conc),
pch = 16, col = "black", alpha = 0.1) +
scale_size_area(breaks = c(0, 0.05, 0.1, 0.2, 0.3),
max_size = 10,
name = 'Total P (mg/L)') +
facet_wrap(~ hydros, nrow = 1, labeller = labeller(.multi_line = FALSE))
plt.conc_tp
ggsave('output/tufte/all-conc.png',
width = 12, height = 6, dpi = 1200)
```
```{r 5P-conc, fig.height=8, fig.width=5}
# ggplot for concentration as a function of fraction and date
plt.conc = map.wallsburg +
geom_point(data = conc_mean,
aes(long, lat, size = conc),
col = 'black', pch = 21) +
geom_point(data = conc_mean,
aes(long, lat, size = conc), # fill
pch = 16, col = "black", alpha = 0.1) +
scale_size_area(breaks = c(0, .01, 0.05, 0.1, 0.2),
max_size = 8,
name = 'Concentration\n(mg/L)' ) +
facet_grid(hydros ~ fractions,
labeller = labeller(.multi_line = FALSE)) +
theme(legend.position = 'left')
plt.conc
ggsave('output/tufte/frac-conc.png',
width = 10, height = 6, dpi = 1200)
```
# Loads
```{r load-data}
# import data
load = read_csv('input/master_load.csv',
na = c('no_access', 'lost')) %>%
filter(., site != 'MC04') %>%
select(., id:DOP, -cfs, -ls) %>%
gather(., 'fraction', 'load', 10:14)
load_mean = read_csv('input/loadmeans.csv') %>%
filter(., site != 'MC04') %>%
gather(., 'fraction', 'load', 8:12)
load_mean$fractions = factor(load_mean$fraction,
levels = c('TP','PP','TDP','DRP','DOP'))
load_mean$hydros = factor(load_mean$hydro,
levels = c('Rise','Peak','Base'))
load_mean_tp = filter(load_mean, fraction == 'TP')
load_base = filter(load, date == '2015-07-29')
load_base_tp = filter(load_base, fraction == 'TP')
load_dry = read_csv('input/master_load.csv') %>%
filter(., site != 'MC04') %>%
select(., id:DOP, -cfs, -ls) %>%
gather(., 'fraction', 'load', 10:14) %>%
filter(., load == 'no_access' | load == 'lost')
```
The loads at each site are as follows in the plots below, starting with the baseflow streamflow of 2015, then the concentrations in each of the averaged hydrologic periods, and then the concentrations of each fraction in each hydrologic period.
```{r load}
#TP Load at Baseflow in 2015
plt.base_load = map.wallsburg +
geom_point(data = load_base_tp,
aes(long, lat, size = load),
pch = 21, col = 'black') +
geom_point(data = load_base_tp,
aes(long, lat, size = load),
pch = 16, col = "black", alpha = 0.1) +
scale_color_discrete(name = 'No Result',
labels = 'Dry\nStreambed') +
scale_size_area(breaks = c(0, 1, 3, 5),
max_size = 10,
name = 'TP Load\n(mg/s)')
plt.base_load
ggsave('output/tufte/base-load.png',
width = 5, height = 7, dpi = 1200)
# TP Loads in each hydrologic period
plt.load_tp = map.wallsburg +
geom_point(data = load_mean_tp,
aes(long, lat, size = load),
pch = 21, col = 'black') +
geom_point(data = load_mean_tp,
aes(long, lat, size = load), # fill
pch = 16, col = 'black', alpha = 0.1) +
scale_size_area(breaks = c(0, 5, 10, 20, 30),
max_size = 10,
name = "Load\n(mg/s)") +
facet_wrap(~ hydros,
ncol = 4,
labeller = labeller(.multi_line = FALSE))
plt.load_tp
ggsave('output/tufte/all-load.png',
width = 12, height = 6, dpi = 1200)
```
```{r loads by fractions, echo=FALSE, fig.height=8, fig.width=5}
# Loads by fraction in each hydrologic period
plt.load = map.wallsburg +
geom_point(data = load_mean,
aes(long, lat, size = load),
pch = 21, col = "black") +
geom_point(data = load_mean,
aes(long, lat, size = load), # fill
pch = 16, col = "black", alpha = 0.1) +
scale_size_area(breaks = c(0, 5, 10, 20, 30),
max_size = 8,
name = "Load\n(mg/s)") +
facet_grid(hydros ~ fractions,
labeller = labeller(.multi_line = FALSE)) +
theme(legend.position = "left")
plt.load
ggsave('output/tufte/frac-load.png',
width = 12, height = 8, dpi = 1200)
```
Once more, the results shown spatially as in the baseflow period of 2015.
```{r gridExtra, echo=FALSE, fig.height=8, fig.width=7}
# Put all baseflow 2015 plots together for export
grid.arrange(plt.flow201507, plt.base_conc, plt.base_load, ncol = 2)
```
# Bar Charts
While the map plots shown previously provide insight in to spatial loading patterns, there was a need to focus on the profiles of Main Creek and Spring Creek through bar charts to better understand how P fractions were changing over space, time, and in relation to each other. Remember the abbreviations are such: total P (TP), particulate P (PP), dissolved reactive P (DRP), and dissolved organic P (DOP). Total dissolved P (TDP) is simply the sum of DRP and DOP, and TP is simply the sum of TDP and PP.
```{r bar-data}
load_mc_tp = filter(load_mean, stream == 'M') %>%
filter(., fraction == 'TP')
load_mc_3frac = filter(load_mean, stream == 'M') %>%
filter(., fraction != 'TP' & fraction != 'TDP')
load_mc_base = filter(load_base_tp, stream == 'M')
load_mean_3frac = filter(load_mean, fraction != 'TP' & fraction != 'TDP')
load_sc_3frac = filter(load_mean, stream == 'S') %>%
filter(., fraction != 'TP' & fraction != 'TDP')
```
```{r bar-01, warning=FALSE}
# Barplot for TP loads over time
# Follow this layout for other bar charts
bar = ggplot() +
scale_fill_manual(values = c('#000000',
'#555555',
'#aaaaaa'),
name = "P Fraction") +
theme_tufte() +
theme(legend.position = 'right',
strip.text = element_text(size = 10),
axis.text.x = element_text(angle = 90, vjust = 0.5))
bar.mc_tp = bar +
geom_col(data = load_mc_tp,
aes(x = site, y = load, fill = fractions),
position = 'stack') +
facet_grid(hydros ~ .,
scales = 'free') +
labs(x = NULL, y = 'Load (mg/s)')
bar.mc_tp
ggsave('output/tufte/bar-mc-all.png',
width = 5, height = 8, dpi = 1200)
```
```{r bar-02}
# Bar chart showing Main Creek profile in baseflow 2015
bar.mc_base = bar +
geom_col(data = load_mc_base,
aes(x = site, y = load, fill = fraction)) +
labs(x = NULL,
y = 'Load (mg/s)')
bar.mc_base
ggsave('output/tufte/bar-mc-base.png',
width = 6, height = 4, dpi = 1200)
# Bar chart with sites along x, and loads of free scale on y
# This allows for one to compare the pattern of load increases and decreases
# across various hydrologic periods
bar.mc = bar +
geom_col(data = load_mc_3frac,
aes(x = site, y = load, fill = fractions),
position = 'stack') +
facet_grid(hydros ~ .,
scales = 'free') +
labs(x = NULL,
y = 'Load (mg/s)')
bar.mc
ggsave('output/tufte/bar-mc-frac.png',
width = 5, height = 8, dpi = 1200)
# This has the fractions scaled to 100 % (which represents TP)
# In this way, proportions can be compared across sites.
bar.mc_percent = bar +
geom_col(data = load_mc_3frac,
aes(x = site, y = load, fill = fractions),
position = 'fill') +
facet_grid(hydros ~ .) +
labs(x = NULL,
y = 'Percent Composition')
bar.mc_percent
ggsave('output/tufte/bar-mc-perc.png',
width = 5, height = 8, dpi = 1200)
# For each site, which isn't really necessary if you just look vertically
# on the other graphs
bar.sites_percent = bar +
geom_col(data = load_mean_3frac,
aes(x = hydros, y = load, fill = fractions),
position = 'fill') +
facet_wrap(~ site) +
labs(x = NULL,
y = 'Load (mg/s)')
bar.sites_percent
ggsave('output/tufte/bar-sites.png',
width = 8, height = 8, dpi = 1200)
```
The chart `bar.sites_percent` didn't make it into the thesis, but it highlights in a different way some of the insights explained in the thesis document about temporal trends in P loading across the watershed.
Similar charts were made for Spring Creek because Spring Creek was identified as an important contributor to P loads in Main Creek.
```{r spring-creek-bar, warning=FALSE}
bar.sc = bar +
geom_col(data = load_sc_3frac,
aes(x = site, y = load, fill = fractions),
position = 'stack') +
facet_grid(hydros ~ .,
scales = 'free') +
labs(x = NULL,
y = 'Load (mg/s)')
bar.sc
ggsave('output/tufte/bar-sc-frac.png',
width = 5, height = 8, dpi = 1200)
bar.sc_percent = bar +
geom_col(data = load_sc_3frac,
aes(x = site, y = load, fill = fractions),
position = 'fill') +
facet_grid(hydros ~ .) +
labs(x = NULL,
y = 'Percent Composition')
bar.sc_percent
ggsave('output/tufte/bar-sc-perc.png',
width = 5, height = 8, dpi = 1200)
```