WorkflowDecisions.Rmd

---
title: "VAST index workflow"
author: "Sarah Gaichas and Sean Lucey"
date: "`r Sys.Date()`"
output:
  html_document:
    code_fold: hide
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
                      message = FALSE,
                      warning = FALSE)

library(tidyverse)
library(here)
library(DT)
```

# Introduction

We used stomach contents data to estimate an index of forage fish biomass for the 2023 state of the ecosystem reports and the bluefish stock assessment. Here, we extend that idea to estimate an index of benthic group biomass for use in fitting food web models of the Gulf of Maine (GOM) and George Bank (GB). 

The basic workflow is to develop a dataset of stomach contents data where fish predators act as samplers of the prey field, then fit a vector autoregressive spatio-temporal (VAST) model to this dataset to generate an index. 

# Data

## Which prey?

Sean has mapped the food habits categories to Rpath categories. We want indices for the benthic categories Megafauna and Macrofauna in the GB and GOM Rpath models.

Upon review of these categories and model input data by Sean and Sarah Weisberg, two changes were made from the original list.

*  The prey "PECTINIDAE", "PECTINIDAE SHELL", "PECTINIDAE VISCERA" were removed from macrobenthos and added to the AtlScallop group, as they likely represent sea scallops identified in stomachs. Therefore these prey categories will not be included in the benthos indices.

*  The prey "ALBUNEA PARETII", "EMERITA TALPOIDA", "RANILIA MURICATA", "HIPPIDAE" were removed from megabenthos and added to macrobenthos, because these are small sand crabs and mole crabs.

All tables in this document have been adjusted to reflect these changes.


```{r}
# object is called prey
load(here("data/prey.RData")) #original mappings from June 2023

addtomacro <- prey |>
  dplyr::filter(RPATH == "Megabenthos") |> 
  dplyr::filter(PYNAM %in% c("ALBUNEA PARETII", "EMERITA TALPOIDA", "RANILIA MURICATA", "HIPPIDAE"))

megaben <- prey |>
  dplyr::filter(RPATH == "Megabenthos") |>
  dplyr::filter(!PYNAM %in% c("ALBUNEA PARETII", "EMERITA TALPOIDA", "RANILIA MURICATA", "HIPPIDAE"))

macroben <- prey |>
  dplyr::filter(RPATH == "Macrobenthos") |> 
  dplyr::filter(!PYNAM %in% c("PECTINIDAE", "PECTINIDAE SHELL", "PECTINIDAE VISCERA")) |>
  dplyr::bind_rows(addtomacro)


```


The Macrobenthos Rpath category has `r dim(macroben)[1]` food habits database species codes:

```{r}
datatable(macroben, rownames = FALSE,
          extensions = c("FixedColumns"),
          caption = "Macrobenthos species names and codes",
          options = list(pageLength = 10,
                         scrollX = TRUE,
                         fixedColumns = list(leftColumns = 1)))
```

The Megabenthos Rpath category has `r dim(megaben)[1]` food habits database species codes:

```{r}
datatable(megaben, rownames = FALSE,
          extensions = c("FixedColumns"),
          caption = "Megabenthos species names and codes",
          options = list(pageLength = 10,
                         scrollX = TRUE,
                         fixedColumns = list(leftColumns = 1)))
```

Which of these show up most often in predator stomachs?

```{r}
# object is called `allfh`
load(url("https://github.com/NOAA-EDAB/forageindex/raw/main/fhdat/allfh.Rdata"))

#object is called allfh21
load(url("https://github.com/NOAA-EDAB/forageindex/raw/main/fhdat/allfh21.RData"))

allfh <- allfh %>%
  dplyr::bind_rows(allfh21)

macrobenfh <- allfh %>%
  dplyr::left_join(macroben %>% 
                     dplyr::select(PYNAM, RPATH) %>%
                     setNames(., tolower(names(.)))) %>%
  dplyr::filter(!is.na(rpath))

megabenfh <- allfh %>%
  dplyr::left_join(megaben %>% 
                     dplyr::select(PYNAM, RPATH) %>%
                     setNames(., tolower(names(.)))) %>%
  dplyr::filter(!is.na(rpath))

macropreycount <- macrobenfh %>%
   #group_by(pdcomnam, pynam) %>%
   group_by(pynam) %>%
   summarise(count = n()) %>%
   #filter(pdcomnam != "") %>%
   arrange(desc(count))
   #pivot_wider(names_from = pdcomnam, values_from = count) 

megapreycount <- megabenfh %>%
   #group_by(pdcomnam, pynam) %>%
   group_by(pynam) %>%
   summarise(count = n()) %>%
   #filter(pdcomnam != "") %>%
   arrange(desc(count))
   #pivot_wider(names_from = pdcomnam, values_from = count) 

datatable(macropreycount, rownames = FALSE,
          extensions = c("FixedColumns"),
          caption = "Number of stomachs with each macrobenthos species across all predators, NEFSC 1973-2021",
          options = list(pageLength = 25,
                         scrollX = TRUE,
                         fixedColumns = list(leftColumns = 1)))

datatable(megapreycount, rownames = FALSE,
          extensions = c("FixedColumns"),
          caption = "Number of stomachs with each megabenthos species across all predators, NEFSC 1973-2021",
          options = list(pageLength = 25,
                         scrollX = TRUE,
                         fixedColumns = list(leftColumns = 1)))

```


## Which predators?

We have a couple of options for a predator list. First is to use every predator (or all above a cutoff number of observations, maybe 50 or 100 across the whole database) where something in the prey list was observed, second is to use all benthivores or some other group(s) as identified by diet similarity.

```{r}

macropredcount <- macrobenfh %>%
  group_by(pdcomnam) %>%
   summarise(count = n()) %>%
   #filter(pdcomnam != "") %>%
   arrange(desc(count))

megapredcount <- megabenfh %>%
  group_by(pdcomnam) %>%
   summarise(count = n()) %>%
   #filter(pdcomnam != "") %>%
   arrange(desc(count))

datatable(macropredcount, rownames = FALSE,
          extensions = c("FixedColumns"),
          caption = "Number of macrobenthos species observations by predator, NEFSC 1973-2021",
          options = list(pageLength = 25,
                         scrollX = TRUE,
                         fixedColumns = list(leftColumns = 1)))

datatable(megapredcount, rownames = FALSE,
          extensions = c("FixedColumns"),
          caption = "Number of megabenthos species observations by predator, NEFSC 1973-2021",
          options = list(pageLength = 25,
                         scrollX = TRUE,
                         fixedColumns = list(leftColumns = 1)))


```

The clusters based on diet similarity show several clusters that appear on the lists above. I think because we want to include so many prey groups, the cluster groupings might actually limit us, or we would have to use more than one cluster:

```{r, fig.height=12}
dietoverlap <- read_csv(here("data-raw/tgmat.2022-02-15.csv"))

# follows example here https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html

library(dendextend)

d_dietoverlap <- dist(dietoverlap)

guilds <- hclust(d_dietoverlap)

#plot(guilds)

dend <- as.dendrogram(guilds)

dend <- rotate(dend, 1:136)

dend <- color_branches(dend, k=6) # Brian uses 6 categories

labels(dend) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dend)],
                           "(",labels(dend),")", 
                           sep = "")

dend <- hang.dendrogram(dend,hang_height=0.1)

# reduce the size of the labels:
# dend <- assign_values_to_leaves_nodePar(dend, 0.5, "lab.cex")
dend <- set(dend, "labels_cex", 0.5)
# And plot:
par(mar = c(3,3,3,7))
plot(dend, 
     main = "Clustered NEFSC diet data, (complete)
     (the labels give the predator species/size)", 
     horiz =  TRUE,  nodePar = list(cex = .007))
#legend("topleft", legend = iris_species, fill = rainbow_hcl(3))

```

We decided to use the cluster groups to eliminate pelagic species that would only rarely consume these prey. Therefore we will use species in 4 of the 6 clusters, but not use species in the planktivore cluster (blue above) or the piscivore cluster (gold above). 

Therefore, the following is the NEFSC BTS predator/size list:

```{r}
pisc <- partition_leaves(dend)[[
  which_node(dend, c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)"))
]]

piscdf <- data.frame("COMNAME" = toupper(str_remove(pisc, "\\..*")),
                             "SizeCat" = str_remove(str_extract(pisc, "\\..*[:upper:]+"), "\\.."))


plank <- partition_leaves(dend)[[
  which_node(dend, c("Atlantic herring..S(20)", "Atlantic mackerel..S(23)", "Blueback herring..XS(34)"))
]]

plankdf <- data.frame("COMNAME" = toupper(str_remove(plank, "\\..*")),
                             "SizeCat" = str_remove(str_extract(plank, "\\..*[:upper:]+"), "\\.."))

all <- partition_leaves(dend)[[
  which_node(dend, c("Barndoor skate..S(26)", "Bluefish..L(35)"))
]]

alldf <- data.frame("COMNAME" = toupper(str_remove(all, "\\..*")),
                             "SizeCat" = str_remove(str_extract(all, "\\..*[:upper:]+"), "\\.."))

sizedef <- readxl::read_excel(here::here("data-raw/Table3.xlsx"),
                              range = "B3:H55") |>  # table of NEFSC FH size categories from Brian
  dplyr::rename(XS = `Extra-Small`,
                S = "Small",
                M = "Medium",
                L = "Large",
                XL = `Extra-Large`,
                CommonName = `Common Name`,
                SpeciesName = `Species Name`) |>
  tidyr::pivot_longer(-c(1:2), names_to = "sizecat", values_to = "range") |>
  dplyr::filter(!range=="-") |>
  dplyr::mutate(COMNAME = toupper(CommonName)) 
  

# add missing by hand from word Table 1.docx 
# (pollock was misspelled, fixed in spreadsheet)
# (removed "flounder" from Windowpane in spreadsheet)
# asked Brian for these; response:
# For these predators and most likely any others not listed in TM216, 
# the default is S (<=20 cm), M (>20 and <=50 cm), and L (>50 cm).
# Northern kingfish M
# Buckler dory S
# Fourbeard rockling S
# Fourbeard rockling M
# Cunner M

extra <- data.frame(CommonName = c("Northern kingfish",
                                      "Buckler dory",
                                      "Fourbeard rockling",
                                      "Fourbeard rockling",
                                      "Cunner"),  
                    sizecat = c("M",
                                "S",
                                "S",
                                "M",
                                "M"), 
                    range = c("21-50",
                              "≤20",
                              "≤20",
                              "21-50",
                              "21-50"))  |>
  dplyr::mutate(COMNAME = toupper(CommonName)) 

sizedef <- bind_rows(sizedef, extra)

 # dplyr::mutate(COMNAME = toupper(`Common Name`),
 #               XS = readr::parse_number(`Extra-Small`),
 #               S = readr::parse_number(Small),
 #               M = readr::parse_number(Medium),
 #               L = readr::parse_number(Large),
 #               XL = readr::parse_number(`Extra-Large`))
  
  
# we'll call benthivores everything but piscivores and planktivores
benthivores <- alldf |>
  dplyr::anti_join(dplyr::bind_rows(piscdf, plankdf)) |>
  dplyr::left_join(sizedef, join_by(COMNAME == COMNAME, SizeCat == sizecat)) |>
  dplyr::left_join(ecodata::species_groupings, by = "COMNAME") |>
  dplyr::select(COMNAME, ITISSPP, SCINAME, SizeCat = SizeCat.x, SizeRange_cm = range) |>
  dplyr::distinct()

datatable(benthivores, rownames = FALSE,
          extensions = c("FixedColumns"),
          caption = "Predator list for benthos indices",
          options = list(pageLength = 90,
                         scrollX = TRUE,
                         fixedColumns = list(leftColumns = 1)))

# no need to save this every time
#saveRDS(benthivores, here::here("data/benthivorelist.rds"))

```

## Which surveys?

Start with NEFSC, find out if MA and ME/NH surveys collect diet information, and add NEAMAP later once predator and prey lists are well established.

MA and ME/NH surveys do not collect diet information.

As of January 2024, predator and prey lists are established enough for a NEAMAP request. Just needed to add the size categories; almost there...

Add ITISSPP to prey lists where possible

```{r, eval=FALSE}
macrobenITIS <- macroben |>
  dplyr::select(PYNAM, PYCOMNAM, PYSPP) |>
  dplyr::left_join(ecodata::species_groupings, join_by(PYNAM == SCINAME)) |>
  dplyr::select(PYNAM, PYCOMNAM, COMNAME, PYSPP, ITISSPP)

megabenITIS <- megaben |>
  dplyr::select(PYNAM, PYCOMNAM, PYSPP) |>
  dplyr::left_join(ecodata::species_groupings, join_by(PYNAM == SCINAME)) |>
  dplyr::select(PYNAM, PYCOMNAM, COMNAME, PYSPP, ITISSPP)

saveRDS(macrobenITIS, here::here("data/macrobenthosprey.rds"))
saveRDS(megabenITIS, here::here("data/megabenthosprey.rds"))

```

NEAMAP survey results received May 2024 using the above prey lists. 

Jim Gartland provided a lookup table of NEFSC to NEAMAP prey codes! Add to ecodata for future use.

For this modify the [data generation script in the forageindex repo](https://github.com/NOAA-EDAB/forageindex/blob/main/fhdat/VASTforage_ProcessInputDat.R). 

All data have been combined in the new script [VASTbenthos_ProcessInputDat.R](https://github.com/NOAA-EDAB/benthosindex/blob/main/fhdata/VASTbenthos_ProcessInputDat.R): 

```{r, code=readLines("https://raw.githubusercontent.com/NOAA-EDAB/benthosindex/main/fhdata/VASTbenthos_ProcessInputDat.R"), eval=FALSE}

```


## Spatial scale

Suggestion: use entire VAST northwest-atlantic grid for estimation and split out GOM and GB as was done for the forage fish index.

## Covariates

Suggestion: start with mean predator size at a station, number of predator species at a station, and bottom water temperature. Is there anything else we would want for benthic organisms? Reviewers suggested depth for the forage fish index, but it did not converge and may be more appropriate as a habitat covariate. 

### Fill missing physical data?

When I looked at bottom temperature data in the NEFSC survey, we had as many missing values as for surface temperature, so it is likely we will want to fill these with other data sources to avoid losing information. Hubert DuPontavice provided his reconstruction of bottom temperatures based on GLORYS and ROMS outputs for the NEUS. The full 1.1 GB dataset is stored locally in data-raw/bottomtemp and the identical source file is [here on google drive](https://drive.google.com/file/d/11HanY5Tzu--77HevCV5gRdNvaAvuYVYU/view?usp=share_link).

Bottom temp has been filled in, see documentation and comparisons [here](https://noaa-edab.github.io/benthosindex/BottomTempFill.html).

# Modeling

## VAST model setup

Univariate or multivariate? First lets try separate univariate models for each of the prey groups, one model for macrobenthos and one for megabenthos. Then we can get fancier trying to look at a multivariate model?

Do we want to separate seasons or try an annual model? I guess it depends on how the Rpaths are set up.

DECISION: Fall is how the GOM is parameterized so start with that. Doing spring too.

These scripts tested fall and spring models, no covariates but estimating full spatial and spatio-temporal RE and anistotropy (current versions run with covariates and bias correction, linked below). 

https://github.com/NOAA-EDAB/benthosindex/blob/fc878aa03f4b744dc87407d4348fce8abdeb82d8/VASTscripts/VASTunivariate_macrobenthos_allsurvs.R

https://github.com/NOAA-EDAB/benthosindex/blob/fc878aa03f4b744dc87407d4348fce8abdeb82d8/VASTscripts/VASTunivariate_megabenthos_allsurvs.R

## VAST model selection

Lets do the same model selection as previously. Two stages, first looking at spatial, spatio-temporal random effects and the second looking at covariates. For completeness, do selection for both models.

Model selection script is 
https://github.com/NOAA-EDAB/benthosindex/blob/main/VASTscripts/VASTunivariate_benthos_modselection.R

### Stage 1 results

```{r}
# from each output folder in pyindex, 
outdir <- here::here("pyindex_modsel1")
moddirs <- list.dirs(outdir) 
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)


# function to apply extracting info
getmodinfo <- function(d.name){
  # read settings
  modpath <- stringr::str_split(d.name, "/", simplify = TRUE)
  modname <- modpath[length(modpath)]
  
  settings <- read.table(file.path(d.name, "settings.txt"), comment.char = "",
    fill = TRUE, header = FALSE)
  
  n_x <- as.numeric(as.character(settings[(which(settings[,1]=="$n_x")+1),2]))
  grid_size_km <- as.numeric(as.character(settings[(which(settings[,1]=="$grid_size_km")+1),2]))
  max_cells <- as.numeric(as.character(settings[(which(settings[,1]=="$max_cells")+1),2]))
  use_anisotropy <- as.character(settings[(which(settings[,1]=="$use_anisotropy")+1),2])
  fine_scale <- as.character(settings[(which(settings[,1]=="$fine_scale")+1),2])
  bias.correct <- as.character(settings[(which(settings[,1]=="$bias.correct")+1),2])
  
  #FieldConfig
  if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Component_1"){
    omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+2),2])
    omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
    epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
    epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+5),1])
    beta1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+6),2])
    beta2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+7),1])
  }
  
  if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Omega1"){
    omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
    omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),1])
    epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),2])
    epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
    beta1 <- "IID"
    beta2 <- "IID"
  }
  
  
  #RhoConfig
  rho_beta1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),1]))
  rho_beta2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),2]))
  rho_epsilon1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),1]))
  rho_epsilon2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),2]))
  
  # read parameter estimates, object is called parameter_Estimates
  if(file.exists(file.path(d.name, "parameter_estimates.RData"))) {
    load(file.path(d.name, "parameter_estimates.RData"))
    
    AIC <- parameter_estimates$AIC[1]  
    converged <- parameter_estimates$Convergence_check[1]
    fixedcoeff <- unname(parameter_estimates$number_of_coefficients[2])
    randomcoeff <- unname(parameter_estimates$number_of_coefficients[3])
    
  }else if(file.exists(file.path(d.name, "parameter_estimates.txt"))){
    
    reptext <- readLines(file.path(d.name, "parameter_estimates.txt"))
    
    AIC <- as.double(reptext[grep(reptext, pattern = "AIC")+1])
    converged <- reptext[grep(reptext, pattern = "Convergence_check")+1]
    fixedcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2], 
                                     boundary("word"))[[1]][2])
    randomcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2], 
                                     boundary("word"))[[1]][3])
    
  }else{
    
    AIC <- NA_real_
    converged <- NA_character_
    fixedcoeff <- NA_integer_
    randomcoeff <- NA_integer_
  }
  
  #index <- read.csv(file.path(d.name, "Index.csv"))
  
  
  # return model attributes as a dataframe
  out <- data.frame(modname = modname,
                    n_x = n_x,
                    grid_size_km = grid_size_km,
                    max_cells = max_cells,
                    use_anisotropy = use_anisotropy,
                    fine_scale =  fine_scale,
                    bias.correct = bias.correct,
                    omega1 = omega1,
                    omega2 = omega2,
                    epsilon1 = epsilon1,
                    epsilon2 = epsilon2,
                    beta1 = beta1,
                    beta2 = beta2,
                    rho_epsilon1 = rho_epsilon1,
                    rho_epsilon2 = rho_epsilon2,
                    rho_beta1 = rho_beta1,
                    rho_beta2 = rho_beta2,
                    AIC = AIC,
                    converged = converged,
                    fixedcoeff = fixedcoeff,
                    randomcoeff = randomcoeff#,
                    #index = index
  )
  	
	return(out)

}


modcompare <- purrr::map_dfr(moddirs, getmodinfo)
```

Models compared using REML are identified by model name ("modname" in Table Sb\@ref(tab:modsel1)) which always includes all prey aggregated, season ("all" for annual models of months 1-12, "fall" for models of months 7-12, and "spring" for models of months 1-6), number of knots (500 for all models), and which fixed and random spatial and spatio-temporal effects were included in which linear predictor (1 or 2). The names for model options and associated VAST model settings are:

Model selection 1 (spatial, spatio-temporal effects, no covariates) options (following @ng_predator_2021) and naming:
*  All models set Use_REML = TRUE in fit_model specifications.  
*  Modeled effects, model name suffix, and VAST settings by model:  

1.  "_alleffectson" = Spatial and spatio-temporal random effects plus anisotropy in both linear predictors; FieldConfig default (all IID)
1.  "_noaniso" = Spatial and spatio-temporal random effects in both linear predictors with anisotopy turned off; FieldConfig default (all IID) and `use_anisotopy = FALSE`
1.  "_noomeps2" = Spatial and spatio-temporal random effects plus anisotropy only in linear predictor 1;   FieldConfig = 0 for Omega2, Epsilon2
1.  "_noomeps2_noaniso" = Spatial and spatio-temporal random effects only in linear predictor 1 with anisotopy turned off;  FieldConfig = 0 for Omega2, Epsilon2 and `use_anisotopy = FALSE`
1. "_noomeps2_noeps1" =  Spatial random effects plus anisotropy only in linear predictor 1;   FieldConfig = 0 for Omega2, Epsilon2, Epsilon1
1.  "_noomeps2_noeps1_noaniso" = Spatial random effects only in linear predictor 1 with anisotopy turned off;   FieldConfig = 0 for Omega2, Epsilon2, Epsilon1 and `use_anisotopy = FALSE`
1.  "_noomeps12" = Anisotropy, but no spatial or spatio-temporal random effects in either linear predictor;   FieldConfig = 0 for both Omega, Epsilon 
1.  "_noomeps12_noaniso" = No spatial or spatio-temporal random effects in either linear predictor;  FieldConfig = 0 for both Omega, Epsilon  and `use_anisotopy = FALSE`


```{r modsel1}
modselect <- modcompare %>%
  dplyr::mutate(season = case_when(str_detect(modname, "_fall_") ~ "Fall",
                            str_detect(modname, "spring") ~ "Spring",
                            str_detect(modname, "_all_") ~ "Annual",
                            TRUE ~ as.character(NA))) %>%
  dplyr::mutate(converged2 = case_when(str_detect(converged, "no evidence") ~ "likely",
                                str_detect(converged, "is likely not") ~ "unlikely",
                                TRUE ~ as.character(NA))) %>%
  dplyr::mutate(bengroup = case_when(str_detect(modname, "macroben") ~ "macrobenthos",
                                str_detect(modname, "megaben") ~ "megabenthos",
                                TRUE ~ as.character(NA))) %>%
  #dplyr::mutate(modname = str_extract(modname, '(?<=allagg_).*')) |>
  dplyr::group_by(bengroup, season) %>%
  dplyr::mutate(deltaAIC = AIC-min(AIC)) %>%
  dplyr::select(modname, season, deltaAIC, fixedcoeff,
         randomcoeff, use_anisotropy, 
         omega1, omega2, epsilon1, epsilon2, 
         beta1, beta2, AIC, converged2) %>%
  dplyr::arrange(season, AIC)

# DT::datatable(modselect, rownames = FALSE, 
#               options= list(pageLength = 25, scrollX = TRUE),
#               caption = "Comparison of delta AIC values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures. See text for model descriptions.")

flextable::flextable(modselect %>%
                       dplyr::select(-c(use_anisotropy, 
         omega1, omega2, epsilon1, epsilon2, 
         beta1, beta2))) %>%
  flextable::set_header_labels(modname = "Model name",
                               season = "Season",
                               deltaAIC = "dAIC",
                               fixedcoeff = "N fixed",
                               randomcoeff = "N random",
                               converged2 = "Convergence") |>
  flextable::set_caption("Comparison of delta AIC (dAIC) values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures, with number of fixed (N fixed) and random (N random) coefficients. See text for model descriptions associated with each model name.") %>%
  flextable::fontsize(size = 9, part = "all") %>%
  flextable::colformat_double(digits = 2) |>
  flextable::set_table_properties(layout = "autofit", width = 1)
```

Using REML, models including spatial and spatio-temporal random effects as well as anisotropy were best supported by the data. This was true for the spring dataset and the fall dataset for both macrobenthos and megabenthos. 

Full outputs from stage 1 mmodel selection are posted to google drive [here](https://drive.google.com/drive/folders/1hiSkfavmvnSx0baCczty0upjFmK6YHin?usp=drive_link). 

All models in stage 2 included spatial and spatio-temporal random effects and anisotropy. 


### Stage 2 results

For the second step of model selection with different combinations of vessel effects and or catchability covariates, "modname" in Table Sb\@ref(tab:modsel2) follows a similar pattern as above, with  all prey aggregated ("allagg" for all models), season ("all" for annual models of months 1-12, "fall" for models of months 7-12, and "spring" for models of months 1-6), number of knots (500 for all models), and which vessel effects or covariates were included. The names for model options and associated VAST model settings are: 

Model selection 2 (covariates) options, FieldConfig default (all IID), with anisotropy:

1.  "_base" =          No vessel overdispersion or length/number covariates   
1.  "_len" =           Predator mean length covariate
1.  "_num" =           Number of predator species covariate
1.  "_lennum" =        Predator mean length and number of predator species covariates
1.  "_bt" =           Combined in situ and model interpolated bottom temp covariate
1.  "_lenbt" =        Predator mean length and bottom temp covariates
1.  "_numbt" =        Number of predator species and bottom temp covariates
1.  "_lennumbt" =     Predator mean length, number of predator species, and bottom temp covariates
1.  "_eta10" =         Overdispersion (vessel effect) in first linear predictor (prey encounter)
1.  "_eta11" =         Overdispersion (vessel effect) in both linear predictors (prey encounter and weight)


```{r modsel2}
# from each output folder in pyindex, 
outdir <- here("pyindex_modsel2")
moddirs <- list.dirs(outdir) 
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)

modcompare2 <- purrr::map_dfr(moddirs, getmodinfo)

modselect2 <- modcompare2 %>%
  dplyr::mutate(season = case_when(str_detect(modname, "_fall_") ~ "Fall",
                            str_detect(modname, "spring") ~ "Spring",
                            str_detect(modname, "_all_") ~ "Annual",
                            TRUE ~ as.character(NA))) %>%
  dplyr::mutate(converged2 = case_when(str_detect(converged, "no evidence") ~ "likely",
                                str_detect(converged, "is likely not") ~ "unlikely",
                                TRUE ~ as.character(NA))) %>%
    dplyr::mutate(bengroup = case_when(str_detect(modname, "macroben") ~ "macrobenthos",
                                str_detect(modname, "megaben") ~ "megabenthos",
                                TRUE ~ as.character(NA))) %>%
  dplyr::group_by(bengroup, season) %>%
  dplyr::mutate(deltaAIC = AIC-min(AIC, na.rm = TRUE)) %>%
  dplyr::select(modname, season, deltaAIC, fixedcoeff,
         randomcoeff, use_anisotropy, 
         omega1, omega2, epsilon1, epsilon2, 
         beta1, beta2, AIC, converged2) %>%
  dplyr::arrange(season, AIC)

# DT::datatable(modselect2, rownames = FALSE, 
#               options= list(pageLength = 50, scrollX = TRUE),
#               caption = "Comparison of delta AIC values for alternative vessel effects and catchability covariates. See text for model descriptions.")

flextable::flextable(modselect2%>%
                       dplyr::select(-c(use_anisotropy, 
         omega1, omega2, epsilon1, epsilon2, 
         beta1, beta2))) %>%
  flextable::set_header_labels(modname = "Model name",
                               season = "Season",
                               deltaAIC = "dAIC",
                               fixedcoeff = "N fixed coefficients",
                               randomcoeff = "N random coefficients",
                               converged2 = "Convergence") |>
  flextable::set_caption("Comparison of delta AIC (dAIC) values for alternative vessel effects and catchability covariates. See text for model descriptions associated with each model name.") %>%
  flextable::fontsize(size = 9, part = "all") %>%
  flextable::colformat_double(digits = 2) |>
  flextable::set_table_properties(layout = "autofit", width = .9)


```

Overall, combinations of catchability covariates were better supported by the data than vessel effects. Model comparisons above led us to the best model fit using mean predator length, number of predator species, and bottom temperature at a survey station as catchability covariates. 

Full results from model selection for covariates are posted to the google drive [here](https://drive.google.com/drive/folders/1xT7Pt8zzvQOsqwJifT_VI5hvl_pQzrjM?usp=drive_link).

## VAST model bias correction and results

Scripts for this are https://github.com/NOAA-EDAB/benthosindex/blob/main/VASTscripts/VASTunivariate_megabenthos_allsurvs.R and https://github.com/NOAA-EDAB/benthosindex/blob/main/VASTscripts/VASTunivariate_macrobenthos_allsurvs.R 

Results have been posted to the google drive [here](https://drive.google.com/drive/folders/1mhoSNmmK0rE374ofBk2iUDmbjcErIRvA?usp=drive_link). See BenthosResults for the initial views and interpretation.