-
Notifications
You must be signed in to change notification settings - Fork 0
/
WorkflowDecisions.Rmd
653 lines (485 loc) · 29.7 KB
/
WorkflowDecisions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
---
title: "VAST index workflow"
author: "Sarah Gaichas and Sean Lucey"
date: "`r Sys.Date()`"
output:
html_document:
code_fold: hide
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE)
library(tidyverse)
library(here)
library(DT)
```
# Introduction
We used stomach contents data to estimate an index of forage fish biomass for the 2023 state of the ecosystem reports and the bluefish stock assessment. Here, we extend that idea to estimate an index of benthic group biomass for use in fitting food web models of the Gulf of Maine (GOM) and George Bank (GB).
The basic workflow is to develop a dataset of stomach contents data where fish predators act as samplers of the prey field, then fit a vector autoregressive spatio-temporal (VAST) model to this dataset to generate an index.
# Data
## Which prey?
Sean has mapped the food habits categories to Rpath categories. We want indices for the benthic categories Megafauna and Macrofauna in the GB and GOM Rpath models.
Upon review of these categories and model input data by Sean and Sarah Weisberg, two changes were made from the original list.
* The prey "PECTINIDAE", "PECTINIDAE SHELL", "PECTINIDAE VISCERA" were removed from macrobenthos and added to the AtlScallop group, as they likely represent sea scallops identified in stomachs. Therefore these prey categories will not be included in the benthos indices.
* The prey "ALBUNEA PARETII", "EMERITA TALPOIDA", "RANILIA MURICATA", "HIPPIDAE" were removed from megabenthos and added to macrobenthos, because these are small sand crabs and mole crabs.
All tables in this document have been adjusted to reflect these changes.
```{r}
# object is called prey
load(here("data/prey.RData")) #original mappings from June 2023
addtomacro <- prey |>
dplyr::filter(RPATH == "Megabenthos") |>
dplyr::filter(PYNAM %in% c("ALBUNEA PARETII", "EMERITA TALPOIDA", "RANILIA MURICATA", "HIPPIDAE"))
megaben <- prey |>
dplyr::filter(RPATH == "Megabenthos") |>
dplyr::filter(!PYNAM %in% c("ALBUNEA PARETII", "EMERITA TALPOIDA", "RANILIA MURICATA", "HIPPIDAE"))
macroben <- prey |>
dplyr::filter(RPATH == "Macrobenthos") |>
dplyr::filter(!PYNAM %in% c("PECTINIDAE", "PECTINIDAE SHELL", "PECTINIDAE VISCERA")) |>
dplyr::bind_rows(addtomacro)
```
The Macrobenthos Rpath category has `r dim(macroben)[1]` food habits database species codes:
```{r}
datatable(macroben, rownames = FALSE,
extensions = c("FixedColumns"),
caption = "Macrobenthos species names and codes",
options = list(pageLength = 10,
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)))
```
The Megabenthos Rpath category has `r dim(megaben)[1]` food habits database species codes:
```{r}
datatable(megaben, rownames = FALSE,
extensions = c("FixedColumns"),
caption = "Megabenthos species names and codes",
options = list(pageLength = 10,
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)))
```
Which of these show up most often in predator stomachs?
```{r}
# object is called `allfh`
load(url("https://github.com/NOAA-EDAB/forageindex/raw/main/fhdat/allfh.Rdata"))
#object is called allfh21
load(url("https://github.com/NOAA-EDAB/forageindex/raw/main/fhdat/allfh21.RData"))
allfh <- allfh %>%
dplyr::bind_rows(allfh21)
macrobenfh <- allfh %>%
dplyr::left_join(macroben %>%
dplyr::select(PYNAM, RPATH) %>%
setNames(., tolower(names(.)))) %>%
dplyr::filter(!is.na(rpath))
megabenfh <- allfh %>%
dplyr::left_join(megaben %>%
dplyr::select(PYNAM, RPATH) %>%
setNames(., tolower(names(.)))) %>%
dplyr::filter(!is.na(rpath))
macropreycount <- macrobenfh %>%
#group_by(pdcomnam, pynam) %>%
group_by(pynam) %>%
summarise(count = n()) %>%
#filter(pdcomnam != "") %>%
arrange(desc(count))
#pivot_wider(names_from = pdcomnam, values_from = count)
megapreycount <- megabenfh %>%
#group_by(pdcomnam, pynam) %>%
group_by(pynam) %>%
summarise(count = n()) %>%
#filter(pdcomnam != "") %>%
arrange(desc(count))
#pivot_wider(names_from = pdcomnam, values_from = count)
datatable(macropreycount, rownames = FALSE,
extensions = c("FixedColumns"),
caption = "Number of stomachs with each macrobenthos species across all predators, NEFSC 1973-2021",
options = list(pageLength = 25,
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)))
datatable(megapreycount, rownames = FALSE,
extensions = c("FixedColumns"),
caption = "Number of stomachs with each megabenthos species across all predators, NEFSC 1973-2021",
options = list(pageLength = 25,
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)))
```
## Which predators?
We have a couple of options for a predator list. First is to use every predator (or all above a cutoff number of observations, maybe 50 or 100 across the whole database) where something in the prey list was observed, second is to use all benthivores or some other group(s) as identified by diet similarity.
```{r}
macropredcount <- macrobenfh %>%
group_by(pdcomnam) %>%
summarise(count = n()) %>%
#filter(pdcomnam != "") %>%
arrange(desc(count))
megapredcount <- megabenfh %>%
group_by(pdcomnam) %>%
summarise(count = n()) %>%
#filter(pdcomnam != "") %>%
arrange(desc(count))
datatable(macropredcount, rownames = FALSE,
extensions = c("FixedColumns"),
caption = "Number of macrobenthos species observations by predator, NEFSC 1973-2021",
options = list(pageLength = 25,
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)))
datatable(megapredcount, rownames = FALSE,
extensions = c("FixedColumns"),
caption = "Number of megabenthos species observations by predator, NEFSC 1973-2021",
options = list(pageLength = 25,
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)))
```
The clusters based on diet similarity show several clusters that appear on the lists above. I think because we want to include so many prey groups, the cluster groupings might actually limit us, or we would have to use more than one cluster:
```{r, fig.height=12}
dietoverlap <- read_csv(here("data-raw/tgmat.2022-02-15.csv"))
# follows example here https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html
library(dendextend)
d_dietoverlap <- dist(dietoverlap)
guilds <- hclust(d_dietoverlap)
#plot(guilds)
dend <- as.dendrogram(guilds)
dend <- rotate(dend, 1:136)
dend <- color_branches(dend, k=6) # Brian uses 6 categories
labels(dend) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dend)],
"(",labels(dend),")",
sep = "")
dend <- hang.dendrogram(dend,hang_height=0.1)
# reduce the size of the labels:
# dend <- assign_values_to_leaves_nodePar(dend, 0.5, "lab.cex")
dend <- set(dend, "labels_cex", 0.5)
# And plot:
par(mar = c(3,3,3,7))
plot(dend,
main = "Clustered NEFSC diet data, (complete)
(the labels give the predator species/size)",
horiz = TRUE, nodePar = list(cex = .007))
#legend("topleft", legend = iris_species, fill = rainbow_hcl(3))
```
We decided to use the cluster groups to eliminate pelagic species that would only rarely consume these prey. Therefore we will use species in 4 of the 6 clusters, but not use species in the planktivore cluster (blue above) or the piscivore cluster (gold above).
Therefore, the following is the NEFSC BTS predator/size list:
```{r}
pisc <- partition_leaves(dend)[[
which_node(dend, c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)"))
]]
piscdf <- data.frame("COMNAME" = toupper(str_remove(pisc, "\\..*")),
"SizeCat" = str_remove(str_extract(pisc, "\\..*[:upper:]+"), "\\.."))
plank <- partition_leaves(dend)[[
which_node(dend, c("Atlantic herring..S(20)", "Atlantic mackerel..S(23)", "Blueback herring..XS(34)"))
]]
plankdf <- data.frame("COMNAME" = toupper(str_remove(plank, "\\..*")),
"SizeCat" = str_remove(str_extract(plank, "\\..*[:upper:]+"), "\\.."))
all <- partition_leaves(dend)[[
which_node(dend, c("Barndoor skate..S(26)", "Bluefish..L(35)"))
]]
alldf <- data.frame("COMNAME" = toupper(str_remove(all, "\\..*")),
"SizeCat" = str_remove(str_extract(all, "\\..*[:upper:]+"), "\\.."))
sizedef <- readxl::read_excel(here::here("data-raw/Table3.xlsx"),
range = "B3:H55") |> # table of NEFSC FH size categories from Brian
dplyr::rename(XS = `Extra-Small`,
S = "Small",
M = "Medium",
L = "Large",
XL = `Extra-Large`,
CommonName = `Common Name`,
SpeciesName = `Species Name`) |>
tidyr::pivot_longer(-c(1:2), names_to = "sizecat", values_to = "range") |>
dplyr::filter(!range=="-") |>
dplyr::mutate(COMNAME = toupper(CommonName))
# add missing by hand from word Table 1.docx
# (pollock was misspelled, fixed in spreadsheet)
# (removed "flounder" from Windowpane in spreadsheet)
# asked Brian for these; response:
# For these predators and most likely any others not listed in TM216,
# the default is S (<=20 cm), M (>20 and <=50 cm), and L (>50 cm).
# Northern kingfish M
# Buckler dory S
# Fourbeard rockling S
# Fourbeard rockling M
# Cunner M
extra <- data.frame(CommonName = c("Northern kingfish",
"Buckler dory",
"Fourbeard rockling",
"Fourbeard rockling",
"Cunner"),
sizecat = c("M",
"S",
"S",
"M",
"M"),
range = c("21-50",
"≤20",
"≤20",
"21-50",
"21-50")) |>
dplyr::mutate(COMNAME = toupper(CommonName))
sizedef <- bind_rows(sizedef, extra)
# dplyr::mutate(COMNAME = toupper(`Common Name`),
# XS = readr::parse_number(`Extra-Small`),
# S = readr::parse_number(Small),
# M = readr::parse_number(Medium),
# L = readr::parse_number(Large),
# XL = readr::parse_number(`Extra-Large`))
# we'll call benthivores everything but piscivores and planktivores
benthivores <- alldf |>
dplyr::anti_join(dplyr::bind_rows(piscdf, plankdf)) |>
dplyr::left_join(sizedef, join_by(COMNAME == COMNAME, SizeCat == sizecat)) |>
dplyr::left_join(ecodata::species_groupings, by = "COMNAME") |>
dplyr::select(COMNAME, ITISSPP, SCINAME, SizeCat = SizeCat.x, SizeRange_cm = range) |>
dplyr::distinct()
datatable(benthivores, rownames = FALSE,
extensions = c("FixedColumns"),
caption = "Predator list for benthos indices",
options = list(pageLength = 90,
scrollX = TRUE,
fixedColumns = list(leftColumns = 1)))
# no need to save this every time
#saveRDS(benthivores, here::here("data/benthivorelist.rds"))
```
## Which surveys?
Start with NEFSC, find out if MA and ME/NH surveys collect diet information, and add NEAMAP later once predator and prey lists are well established.
MA and ME/NH surveys do not collect diet information.
As of January 2024, predator and prey lists are established enough for a NEAMAP request. Just needed to add the size categories; almost there...
Add ITISSPP to prey lists where possible
```{r, eval=FALSE}
macrobenITIS <- macroben |>
dplyr::select(PYNAM, PYCOMNAM, PYSPP) |>
dplyr::left_join(ecodata::species_groupings, join_by(PYNAM == SCINAME)) |>
dplyr::select(PYNAM, PYCOMNAM, COMNAME, PYSPP, ITISSPP)
megabenITIS <- megaben |>
dplyr::select(PYNAM, PYCOMNAM, PYSPP) |>
dplyr::left_join(ecodata::species_groupings, join_by(PYNAM == SCINAME)) |>
dplyr::select(PYNAM, PYCOMNAM, COMNAME, PYSPP, ITISSPP)
saveRDS(macrobenITIS, here::here("data/macrobenthosprey.rds"))
saveRDS(megabenITIS, here::here("data/megabenthosprey.rds"))
```
NEAMAP survey results received May 2024 using the above prey lists.
Jim Gartland provided a lookup table of NEFSC to NEAMAP prey codes! Add to ecodata for future use.
For this modify the [data generation script in the forageindex repo](https://github.com/NOAA-EDAB/forageindex/blob/main/fhdat/VASTforage_ProcessInputDat.R).
All data have been combined in the new script [VASTbenthos_ProcessInputDat.R](https://github.com/NOAA-EDAB/benthosindex/blob/main/fhdata/VASTbenthos_ProcessInputDat.R):
```{r, code=readLines("https://raw.githubusercontent.com/NOAA-EDAB/benthosindex/main/fhdata/VASTbenthos_ProcessInputDat.R"), eval=FALSE}
```
## Spatial scale
Suggestion: use entire VAST northwest-atlantic grid for estimation and split out GOM and GB as was done for the forage fish index.
## Covariates
Suggestion: start with mean predator size at a station, number of predator species at a station, and bottom water temperature. Is there anything else we would want for benthic organisms? Reviewers suggested depth for the forage fish index, but it did not converge and may be more appropriate as a habitat covariate.
### Fill missing physical data?
When I looked at bottom temperature data in the NEFSC survey, we had as many missing values as for surface temperature, so it is likely we will want to fill these with other data sources to avoid losing information. Hubert DuPontavice provided his reconstruction of bottom temperatures based on GLORYS and ROMS outputs for the NEUS. The full 1.1 GB dataset is stored locally in data-raw/bottomtemp and the identical source file is [here on google drive](https://drive.google.com/file/d/11HanY5Tzu--77HevCV5gRdNvaAvuYVYU/view?usp=share_link).
Bottom temp has been filled in, see documentation and comparisons [here](https://noaa-edab.github.io/benthosindex/BottomTempFill.html).
# Modeling
## VAST model setup
Univariate or multivariate? First lets try separate univariate models for each of the prey groups, one model for macrobenthos and one for megabenthos. Then we can get fancier trying to look at a multivariate model?
Do we want to separate seasons or try an annual model? I guess it depends on how the Rpaths are set up.
DECISION: Fall is how the GOM is parameterized so start with that. Doing spring too.
These scripts tested fall and spring models, no covariates but estimating full spatial and spatio-temporal RE and anistotropy (current versions run with covariates and bias correction, linked below).
https://github.com/NOAA-EDAB/benthosindex/blob/fc878aa03f4b744dc87407d4348fce8abdeb82d8/VASTscripts/VASTunivariate_macrobenthos_allsurvs.R
https://github.com/NOAA-EDAB/benthosindex/blob/fc878aa03f4b744dc87407d4348fce8abdeb82d8/VASTscripts/VASTunivariate_megabenthos_allsurvs.R
## VAST model selection
Lets do the same model selection as previously. Two stages, first looking at spatial, spatio-temporal random effects and the second looking at covariates. For completeness, do selection for both models.
Model selection script is
https://github.com/NOAA-EDAB/benthosindex/blob/main/VASTscripts/VASTunivariate_benthos_modselection.R
### Stage 1 results
```{r}
# from each output folder in pyindex,
outdir <- here::here("pyindex_modsel1")
moddirs <- list.dirs(outdir)
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)
# function to apply extracting info
getmodinfo <- function(d.name){
# read settings
modpath <- stringr::str_split(d.name, "/", simplify = TRUE)
modname <- modpath[length(modpath)]
settings <- read.table(file.path(d.name, "settings.txt"), comment.char = "",
fill = TRUE, header = FALSE)
n_x <- as.numeric(as.character(settings[(which(settings[,1]=="$n_x")+1),2]))
grid_size_km <- as.numeric(as.character(settings[(which(settings[,1]=="$grid_size_km")+1),2]))
max_cells <- as.numeric(as.character(settings[(which(settings[,1]=="$max_cells")+1),2]))
use_anisotropy <- as.character(settings[(which(settings[,1]=="$use_anisotropy")+1),2])
fine_scale <- as.character(settings[(which(settings[,1]=="$fine_scale")+1),2])
bias.correct <- as.character(settings[(which(settings[,1]=="$bias.correct")+1),2])
#FieldConfig
if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Component_1"){
omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+2),2])
omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+5),1])
beta1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+6),2])
beta2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+7),1])
}
if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Omega1"){
omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),1])
epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),2])
epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
beta1 <- "IID"
beta2 <- "IID"
}
#RhoConfig
rho_beta1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),1]))
rho_beta2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),2]))
rho_epsilon1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),1]))
rho_epsilon2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),2]))
# read parameter estimates, object is called parameter_Estimates
if(file.exists(file.path(d.name, "parameter_estimates.RData"))) {
load(file.path(d.name, "parameter_estimates.RData"))
AIC <- parameter_estimates$AIC[1]
converged <- parameter_estimates$Convergence_check[1]
fixedcoeff <- unname(parameter_estimates$number_of_coefficients[2])
randomcoeff <- unname(parameter_estimates$number_of_coefficients[3])
}else if(file.exists(file.path(d.name, "parameter_estimates.txt"))){
reptext <- readLines(file.path(d.name, "parameter_estimates.txt"))
AIC <- as.double(reptext[grep(reptext, pattern = "AIC")+1])
converged <- reptext[grep(reptext, pattern = "Convergence_check")+1]
fixedcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2],
boundary("word"))[[1]][2])
randomcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2],
boundary("word"))[[1]][3])
}else{
AIC <- NA_real_
converged <- NA_character_
fixedcoeff <- NA_integer_
randomcoeff <- NA_integer_
}
#index <- read.csv(file.path(d.name, "Index.csv"))
# return model attributes as a dataframe
out <- data.frame(modname = modname,
n_x = n_x,
grid_size_km = grid_size_km,
max_cells = max_cells,
use_anisotropy = use_anisotropy,
fine_scale = fine_scale,
bias.correct = bias.correct,
omega1 = omega1,
omega2 = omega2,
epsilon1 = epsilon1,
epsilon2 = epsilon2,
beta1 = beta1,
beta2 = beta2,
rho_epsilon1 = rho_epsilon1,
rho_epsilon2 = rho_epsilon2,
rho_beta1 = rho_beta1,
rho_beta2 = rho_beta2,
AIC = AIC,
converged = converged,
fixedcoeff = fixedcoeff,
randomcoeff = randomcoeff#,
#index = index
)
return(out)
}
modcompare <- purrr::map_dfr(moddirs, getmodinfo)
```
Models compared using REML are identified by model name ("modname" in Table Sb\@ref(tab:modsel1)) which always includes all prey aggregated, season ("all" for annual models of months 1-12, "fall" for models of months 7-12, and "spring" for models of months 1-6), number of knots (500 for all models), and which fixed and random spatial and spatio-temporal effects were included in which linear predictor (1 or 2). The names for model options and associated VAST model settings are:
Model selection 1 (spatial, spatio-temporal effects, no covariates) options (following @ng_predator_2021) and naming:
* All models set Use_REML = TRUE in fit_model specifications.
* Modeled effects, model name suffix, and VAST settings by model:
1. "_alleffectson" = Spatial and spatio-temporal random effects plus anisotropy in both linear predictors; FieldConfig default (all IID)
1. "_noaniso" = Spatial and spatio-temporal random effects in both linear predictors with anisotopy turned off; FieldConfig default (all IID) and `use_anisotopy = FALSE`
1. "_noomeps2" = Spatial and spatio-temporal random effects plus anisotropy only in linear predictor 1; FieldConfig = 0 for Omega2, Epsilon2
1. "_noomeps2_noaniso" = Spatial and spatio-temporal random effects only in linear predictor 1 with anisotopy turned off; FieldConfig = 0 for Omega2, Epsilon2 and `use_anisotopy = FALSE`
1. "_noomeps2_noeps1" = Spatial random effects plus anisotropy only in linear predictor 1; FieldConfig = 0 for Omega2, Epsilon2, Epsilon1
1. "_noomeps2_noeps1_noaniso" = Spatial random effects only in linear predictor 1 with anisotopy turned off; FieldConfig = 0 for Omega2, Epsilon2, Epsilon1 and `use_anisotopy = FALSE`
1. "_noomeps12" = Anisotropy, but no spatial or spatio-temporal random effects in either linear predictor; FieldConfig = 0 for both Omega, Epsilon
1. "_noomeps12_noaniso" = No spatial or spatio-temporal random effects in either linear predictor; FieldConfig = 0 for both Omega, Epsilon and `use_anisotopy = FALSE`
```{r modsel1}
modselect <- modcompare %>%
dplyr::mutate(season = case_when(str_detect(modname, "_fall_") ~ "Fall",
str_detect(modname, "spring") ~ "Spring",
str_detect(modname, "_all_") ~ "Annual",
TRUE ~ as.character(NA))) %>%
dplyr::mutate(converged2 = case_when(str_detect(converged, "no evidence") ~ "likely",
str_detect(converged, "is likely not") ~ "unlikely",
TRUE ~ as.character(NA))) %>%
dplyr::mutate(bengroup = case_when(str_detect(modname, "macroben") ~ "macrobenthos",
str_detect(modname, "megaben") ~ "megabenthos",
TRUE ~ as.character(NA))) %>%
#dplyr::mutate(modname = str_extract(modname, '(?<=allagg_).*')) |>
dplyr::group_by(bengroup, season) %>%
dplyr::mutate(deltaAIC = AIC-min(AIC)) %>%
dplyr::select(modname, season, deltaAIC, fixedcoeff,
randomcoeff, use_anisotropy,
omega1, omega2, epsilon1, epsilon2,
beta1, beta2, AIC, converged2) %>%
dplyr::arrange(season, AIC)
# DT::datatable(modselect, rownames = FALSE,
# options= list(pageLength = 25, scrollX = TRUE),
# caption = "Comparison of delta AIC values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures. See text for model descriptions.")
flextable::flextable(modselect %>%
dplyr::select(-c(use_anisotropy,
omega1, omega2, epsilon1, epsilon2,
beta1, beta2))) %>%
flextable::set_header_labels(modname = "Model name",
season = "Season",
deltaAIC = "dAIC",
fixedcoeff = "N fixed",
randomcoeff = "N random",
converged2 = "Convergence") |>
flextable::set_caption("Comparison of delta AIC (dAIC) values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures, with number of fixed (N fixed) and random (N random) coefficients. See text for model descriptions associated with each model name.") %>%
flextable::fontsize(size = 9, part = "all") %>%
flextable::colformat_double(digits = 2) |>
flextable::set_table_properties(layout = "autofit", width = 1)
```
Using REML, models including spatial and spatio-temporal random effects as well as anisotropy were best supported by the data. This was true for the spring dataset and the fall dataset for both macrobenthos and megabenthos.
Full outputs from stage 1 mmodel selection are posted to google drive [here](https://drive.google.com/drive/folders/1hiSkfavmvnSx0baCczty0upjFmK6YHin?usp=drive_link).
All models in stage 2 included spatial and spatio-temporal random effects and anisotropy.
### Stage 2 results
For the second step of model selection with different combinations of vessel effects and or catchability covariates, "modname" in Table Sb\@ref(tab:modsel2) follows a similar pattern as above, with all prey aggregated ("allagg" for all models), season ("all" for annual models of months 1-12, "fall" for models of months 7-12, and "spring" for models of months 1-6), number of knots (500 for all models), and which vessel effects or covariates were included. The names for model options and associated VAST model settings are:
Model selection 2 (covariates) options, FieldConfig default (all IID), with anisotropy:
1. "_base" = No vessel overdispersion or length/number covariates
1. "_len" = Predator mean length covariate
1. "_num" = Number of predator species covariate
1. "_lennum" = Predator mean length and number of predator species covariates
1. "_bt" = Combined in situ and model interpolated bottom temp covariate
1. "_lenbt" = Predator mean length and bottom temp covariates
1. "_numbt" = Number of predator species and bottom temp covariates
1. "_lennumbt" = Predator mean length, number of predator species, and bottom temp covariates
1. "_eta10" = Overdispersion (vessel effect) in first linear predictor (prey encounter)
1. "_eta11" = Overdispersion (vessel effect) in both linear predictors (prey encounter and weight)
```{r modsel2}
# from each output folder in pyindex,
outdir <- here("pyindex_modsel2")
moddirs <- list.dirs(outdir)
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)
modcompare2 <- purrr::map_dfr(moddirs, getmodinfo)
modselect2 <- modcompare2 %>%
dplyr::mutate(season = case_when(str_detect(modname, "_fall_") ~ "Fall",
str_detect(modname, "spring") ~ "Spring",
str_detect(modname, "_all_") ~ "Annual",
TRUE ~ as.character(NA))) %>%
dplyr::mutate(converged2 = case_when(str_detect(converged, "no evidence") ~ "likely",
str_detect(converged, "is likely not") ~ "unlikely",
TRUE ~ as.character(NA))) %>%
dplyr::mutate(bengroup = case_when(str_detect(modname, "macroben") ~ "macrobenthos",
str_detect(modname, "megaben") ~ "megabenthos",
TRUE ~ as.character(NA))) %>%
dplyr::group_by(bengroup, season) %>%
dplyr::mutate(deltaAIC = AIC-min(AIC, na.rm = TRUE)) %>%
dplyr::select(modname, season, deltaAIC, fixedcoeff,
randomcoeff, use_anisotropy,
omega1, omega2, epsilon1, epsilon2,
beta1, beta2, AIC, converged2) %>%
dplyr::arrange(season, AIC)
# DT::datatable(modselect2, rownames = FALSE,
# options= list(pageLength = 50, scrollX = TRUE),
# caption = "Comparison of delta AIC values for alternative vessel effects and catchability covariates. See text for model descriptions.")
flextable::flextable(modselect2%>%
dplyr::select(-c(use_anisotropy,
omega1, omega2, epsilon1, epsilon2,
beta1, beta2))) %>%
flextable::set_header_labels(modname = "Model name",
season = "Season",
deltaAIC = "dAIC",
fixedcoeff = "N fixed coefficients",
randomcoeff = "N random coefficients",
converged2 = "Convergence") |>
flextable::set_caption("Comparison of delta AIC (dAIC) values for alternative vessel effects and catchability covariates. See text for model descriptions associated with each model name.") %>%
flextable::fontsize(size = 9, part = "all") %>%
flextable::colformat_double(digits = 2) |>
flextable::set_table_properties(layout = "autofit", width = .9)
```
Overall, combinations of catchability covariates were better supported by the data than vessel effects. Model comparisons above led us to the best model fit using mean predator length, number of predator species, and bottom temperature at a survey station as catchability covariates.
Full results from model selection for covariates are posted to the google drive [here](https://drive.google.com/drive/folders/1xT7Pt8zzvQOsqwJifT_VI5hvl_pQzrjM?usp=drive_link).
## VAST model bias correction and results
Scripts for this are https://github.com/NOAA-EDAB/benthosindex/blob/main/VASTscripts/VASTunivariate_megabenthos_allsurvs.R and https://github.com/NOAA-EDAB/benthosindex/blob/main/VASTscripts/VASTunivariate_macrobenthos_allsurvs.R
Results have been posted to the google drive [here](https://drive.google.com/drive/folders/1mhoSNmmK0rE374ofBk2iUDmbjcErIRvA?usp=drive_link). See BenthosResults for the initial views and interpretation.