06-full_workflow.Rmd

# Full workflow
![](https://badgen.net/badge/status/WIP/orange)

## Preface
This is an example of me (Julie) "talking" the reader through conducting an iSSA on the
fisher data that is set up in our 
[targets workflow](https://github.com/iSSA-guild/targets-issa).

## Preparing an iSSA

### The data

The critical data for running an iSSA are the x and y coordinates for GPS
relocations and the associated datetime timestamp. Depending on what you're
doing, you'll probably want to know the individual that those relocations belong
to as well.

There are several things that need to be done prior to actually doing an iSSA

Overarchingly:

1. Clean the GPS data
2. Prepare the layers and other covariate data

#### Clean the GPS data

Things that you need to check and clean if they are not met:

1. Make sure all observations (GPS points) are complete
 - if not, remove incomplete observations
2. Check if there are duplicated GPS points (two taken at the same time)
 - remove duplicate observations

#### Prepare covariate input data

It is critical that all of the layers you plan to use and GPS points are in the
same coordinate reference system. This can take a long time to reproject layers,
crop them, and anything else you need to do for everything to be in the same
"world," but it is critical to do for the later steps of extracting covariates
by step.

### Making tracks

Once the data is clean, we make the points into a track. If you have more than
one individual in the dataset, do it by individual.

Generating tracks:
```{r tracks_expr, echo = FALSE, comment = ''}
get_target_expression(targets_tracks, 'tracks')
```


#### Check the sampling rate

Now we should check how the sampling rate looks due to the messiness of GPS 
relocations. 

```{r samp_rate}
summarize_sampling_rate_many(tar_read(tracks), 'id')
```

Data are messy. For iSSA, we need the GPS fixes to happen at regular intervals,
the same for all individuals in the model, so we likely need to resample the
data.

```{r resample_expr, echo = FALSE, comment = ''}
get_target_expression(targets_tracks, 'tracks_resampled')
```

In our `targets` workflow, we have simplified all of this. Here, all you will 
have to do is at the beginning of the workflow: fill in 
the [Variables] to match your data.

```{r tar_vars, comment = '', echo=FALSE}
l <- readLines('https://raw.githubusercontent.com/robitalec/targets-issa/main/_targets.R')
writeLines(l[grep('Variables', l):(grep('Targets: data', l)-1)])
```

### Steps

Now we have tracks, but a step selection analysis requires steps! So, in our 
targets workflow, we only keep data for individuals that have at least 3 steps, 
and this is adjustable depending on what you need.


#### Check the distribution of steps and turn angles

Now to make sure there aren't any biological impossibilities in the steps.
Usually there are a lot of little steps and few big steps, but this is a good
check to make sure any erroneous observations are removed. For instance, this
would show if there are any steps that are not biologically possible. It also
shows the distribution so you know if you should use a Gamma or exponential
distribution for generating random steps. Most are Gamma, and that is the
default in `amt`.

SL distribution for 1 individual
```{r plot_dist_sl}
tar_read(dist_sl_plots, 1)
```
Looks like a gamma distribution, so that's good

TA distribution for 1 individual
```{r plot_ta_dist}
tar_read(dist_ta_plots, 1)
```
Looks like a von Mises distribution, so that's also good


### Prepare the data to go into the iSSA

So, at this point, we have our observed steps and know what their distributions
look like. Now we can use this information to properly inform our random steps
with the correct distribution. As we create the random steps, we will also
extract the covariates of interest at each step.

#### Motivation vs Selection

One of the critical decisions to make in your analysis is if you want to know if
a variable is motivating why/how an animal moves vs. if the animal is selecting
for that variable; this decision influences if you use the information from the
start of the step (motivation) vs. the end of the step (selection) in your
analysis. This decision is of course determined by your question and hypotheses

See [Choosing covariates] for more info

#### Example extraction

For example, here we are creating random steps and extracting information about
what landcover class and the distance from water an individual selects (at the end of each step) and 
time of day (day or night at the start of the step) at the time of each step.
This will create a dataset that includes step length, turn angle, whether it's
day or night at the start of the step, and what land class an animal is on at the
end of their step.

*NOTE: In the `targets` workflow, we have created out own extraction functions
instead of using `amt`'s `extract_covariate` and `time_of_day` functions. This
is for several reasons - more flexibility - `amt` isn't functional with `terra`
yet (coming soon) - `time_of_day` is good for day vs night, but if you want dawn
or dusk, it's not great at larger fixrates. For instance, if you have a 2 hour
fixrate, it rarely includes dawn or dusk because those time frames are narrow in
the default `maptools` function that `time_of_day` wraps around. In this
circumstance, we have created our own time of day extractions as needed by
location. However, for our fisher example here with a 30 minute fixrate,
including default crepuscular times should be fine.*
```{r extracting, comment =''}
# R/extract_layers.R
extract_layers
```

If you have multiple hypotheses that would benefit from know information from
both the beginning and the end of the step, you can input`"both"` instead of
`"start"` or `"end"`.

So, in the end, we get a dataset that looks something like:
```{r extracted_dat, echo=FALSE}
tar_read(tracks_extract)[1:11]
```


## Running an iSSA
This workflow is going to be an example using the Poisson point process method
described by Muff et al [-@Muff_2020]. This method allows us to look at all individuals
at once so we can examine both the individual- and population-level responses at
once. In this example, we're going to include both categorical and continuous
variables.

*NOTE: Random effects in GLMMs are supposed to have 5 or more levels, which
means you should have at least 5 individuals if you want to include them as a
random effect.*


```{r f_model_land_cover, comment =''}
# R/model_land_cover.R
model_land_cover
```


```{r summary_model_lc}
summary(tar_read(model_lc))
```

When looking at the output of this model, we see some issues. First of all,
there are some warnings from the model. The positive hessian warning is applied
liberally in `glmmTMB`, but the real sign of a problem is the `NAs` reported for
variation. This model didn't converge properly, so we need to look at why.

Things to explore at the data level include:

- lots of `NAs` for a particular variable [(helpful tips from Bolker)](https://stackoverflow.com/questions/62239351/why-am-i-getting-nas-in-the-model-summary-output-zero-inflated-glmm-with-glmmtm)
- lots of variation or very little variation in a particular variable
- availability of variables for individuals

Things to explore at the model level include:

- the model is over-parameterized for the amount of data
- the high variance is assigned at the strata level and not for another random effect
- variables are similarly scaled
- if using the Muff method, the strata represent the step id by individual 
(`indiv_step_id`), not the step id that is assigned by 	`amt` because the way 
we have it coded, the numbers restart for each individual
- categorical variables are classified as factors, especially `indiv_step_id`
- random effects have at least 5 levels (this is likely not going to be a 
problem for most iSSAs, but it has come up)

```{r exploring_data}
# checking availability 
# for landcover
tar_read(avail_lc)

# for distance to water
tracks_extract <- tar_read(tracks_extract)
tracks_extract[!(case_), quantile(dist_to_water, na.rm = TRUE), by = id]
```

After some exploration, I'm thinking this model is over-parameterized for 6
individuals, at least having so many landcover classes. So, I'm going to pretend
for this example that we only care about selection for forest and disturbed
habitats. We created dummy variables for this based on the extracted landcover
data and ran the iSSA on this data.

```{r f_model_forest, comment =''}
# R/model_forest_bin.R
model_forest_bin
```


```{r summary_model_forest}
summary(tar_read(model_forest))
```

This model runs without issue.


## Interpretation

Demonstrating effects from iSSA output is an area where you will see a lot of
different avenues at this point. Some just show the beta estimates from the
model and p-values. This, however can result in misinterpretation of biological significance [@Avgar2017].

### Relative Selection Strength

This is where Relative Selection Strength (RSS) comes in. Avgar et al [-@Avgar2017]
derived various equations to calculate RSS depending on if variables are on
their own, interacted with others, log-transformed, etc. However, since then
Brian Smith has determined that using the `predict()` function can achieve the
same outcome (see proof [Smith 2020](https://bsmity13.github.io/log_rss/)). 
But this is all the mathematical details.
What RSSs are really doing is comparing the selection strength for habitat at location 1 (H1) compared to location 2 (H2).So, say an animal is not in a forest, are they relatively
more likely to select to be in a forest or avoid the forest? Then the RSS shows
the strength of selection or avoidance over a range of habitat values of that
habitat type.


In the targets-issa workflow, we predict H1, and H2 for forest using
the following functions:

```{r RSS_forest_ex}
# R/predict_h1_forest.R
predict_h1_forest

# R/predict_h2.R
predict_h2

# Then we can calculate RSS for forest as H1 - H2:

# R/calc_rss.R
calc_rss
```

And finally, the plots with the `plot_rss` function. 

```{r rss_forest_plot}
tar_read(plot_rss_forest)
```

JWT comment: I personally find it hard to wrap my head around RSSs for
categorical variables, and I don't have a great way of explaining it for others
like me who didn't get it. The closest I can describe is, say, looking at this
plot, the individual starts not in forest, is their next step more likely to go
to forest or not. To continuous nature of the RSS is hard for me to wrap my head
around with a categorical variable. I just pretend in my head that it's a
proportional value of 'no forest' = 0 and 'all forest' = 1, and the middle is an
estimate based on the model. It's not perfect, but that's the best I've been
able to come to terms with it. :shrug: #OverlyHonestMethods


We can then do the same thing to determine the RSS for distance to water.
```{r RSS_water_ex}
tar_read(plot_rss_water)
```

<!-- #TODO insert description of the output graphs -->

RSSs have to be done one habitat at a time. That being said, interactions can be
explored by pre-determining cut off points to examine one of the variables at a
time. I know that sounds confusing. So, for example, we have selection of forest
interacted with distance to road. Say we want to understand how animals select
for forest when they are near or far from roads. So, we define being near a road
as being 100m of the road and being far from the road as 5000m from the road.
These values are semi-arbitrary, but they should be based on the biology of the
animal. This complicates the calculation of the RSS a bit, making it a two step
process with the predict method. You first have to calculate the overall
selection for, in this case, forest at 100m from the road, then you run the run
the predict function over the availability of forest and add them together.


<!-- # TODO: make up some example, there's not one relevant in the current toy model -->

```{r RSS_interaction}
```

### Speed

Speed is a different kettle of fish. Some people show box plots of step length
to give an idea of speed, but this is only a partial show of effect size (*this
is not phrased well -- fix*). When using a gamma distribution for an iSSA, the
mean estimated speed is shape parameter multiplied by the scale parameter from
the gamma distribution. However, the shape is modified by the movement variables
in the iSSA itself. So this calculation becomes


$\text{mean estimated speed} = (shape + logSL.beta + logSLintx.beta*mean(what.interacted.with))*scale$


So, from our toy example, this could be done as below. Here, we can pick which
covariate we want to vary to estimate speed to see how speed is impacted by what
it is interacted with. `seq` represents the range of values we're looking over.

```{r calc_speed_code}
# R/calc_speed.R
calc_speed
```

Let's see how fishers' speed changes in areas without anthropogenic disturbance vs areas that are disturbed (`seq = 0:1`)

```{r plot_speed}
tar_read(plot_speed_disturbed)
```