-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-full_workflow.Rmd
339 lines (248 loc) · 12.7 KB
/
06-full_workflow.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# Full workflow
![](https://badgen.net/badge/status/WIP/orange)
## Preface
This is an example of me (Julie) "talking" the reader through conducting an iSSA on the
fisher data that is set up in our
[targets workflow](https://github.com/iSSA-guild/targets-issa).
## Preparing an iSSA
### The data
The critical data for running an iSSA are the x and y coordinates for GPS
relocations and the associated datetime timestamp. Depending on what you're
doing, you'll probably want to know the individual that those relocations belong
to as well.
There are several things that need to be done prior to actually doing an iSSA
Overarchingly:
1. Clean the GPS data
2. Prepare the layers and other covariate data
#### Clean the GPS data
Things that you need to check and clean if they are not met:
1. Make sure all observations (GPS points) are complete
- if not, remove incomplete observations
2. Check if there are duplicated GPS points (two taken at the same time)
- remove duplicate observations
#### Prepare covariate input data
It is critical that all of the layers you plan to use and GPS points are in the
same coordinate reference system. This can take a long time to reproject layers,
crop them, and anything else you need to do for everything to be in the same
"world," but it is critical to do for the later steps of extracting covariates
by step.
### Making tracks
Once the data is clean, we make the points into a track. If you have more than
one individual in the dataset, do it by individual.
Generating tracks:
```{r tracks_expr, echo = FALSE, comment = ''}
get_target_expression(targets_tracks, 'tracks')
```
#### Check the sampling rate
Now we should check how the sampling rate looks due to the messiness of GPS
relocations.
```{r samp_rate}
summarize_sampling_rate_many(tar_read(tracks), 'id')
```
Data are messy. For iSSA, we need the GPS fixes to happen at regular intervals,
the same for all individuals in the model, so we likely need to resample the
data.
```{r resample_expr, echo = FALSE, comment = ''}
get_target_expression(targets_tracks, 'tracks_resampled')
```
In our `targets` workflow, we have simplified all of this. Here, all you will
have to do is at the beginning of the workflow: fill in
the [Variables] to match your data.
```{r tar_vars, comment = '', echo=FALSE}
l <- readLines('https://raw.githubusercontent.com/robitalec/targets-issa/main/_targets.R')
writeLines(l[grep('Variables', l):(grep('Targets: data', l)-1)])
```
### Steps
Now we have tracks, but a step selection analysis requires steps! So, in our
targets workflow, we only keep data for individuals that have at least 3 steps,
and this is adjustable depending on what you need.
#### Check the distribution of steps and turn angles
Now to make sure there aren't any biological impossibilities in the steps.
Usually there are a lot of little steps and few big steps, but this is a good
check to make sure any erroneous observations are removed. For instance, this
would show if there are any steps that are not biologically possible. It also
shows the distribution so you know if you should use a Gamma or exponential
distribution for generating random steps. Most are Gamma, and that is the
default in `amt`.
SL distribution for 1 individual
```{r plot_dist_sl}
tar_read(dist_sl_plots, 1)
```
Looks like a gamma distribution, so that's good
TA distribution for 1 individual
```{r plot_ta_dist}
tar_read(dist_ta_plots, 1)
```
Looks like a von Mises distribution, so that's also good
### Prepare the data to go into the iSSA
So, at this point, we have our observed steps and know what their distributions
look like. Now we can use this information to properly inform our random steps
with the correct distribution. As we create the random steps, we will also
extract the covariates of interest at each step.
#### Motivation vs Selection
One of the critical decisions to make in your analysis is if you want to know if
a variable is motivating why/how an animal moves vs. if the animal is selecting
for that variable; this decision influences if you use the information from the
start of the step (motivation) vs. the end of the step (selection) in your
analysis. This decision is of course determined by your question and hypotheses
See [Choosing covariates] for more info
#### Example extraction
For example, here we are creating random steps and extracting information about
what landcover class and the distance from water an individual selects (at the end of each step) and
time of day (day or night at the start of the step) at the time of each step.
This will create a dataset that includes step length, turn angle, whether it's
day or night at the start of the step, and what land class an animal is on at the
end of their step.
*NOTE: In the `targets` workflow, we have created out own extraction functions
instead of using `amt`'s `extract_covariate` and `time_of_day` functions. This
is for several reasons - more flexibility - `amt` isn't functional with `terra`
yet (coming soon) - `time_of_day` is good for day vs night, but if you want dawn
or dusk, it's not great at larger fixrates. For instance, if you have a 2 hour
fixrate, it rarely includes dawn or dusk because those time frames are narrow in
the default `maptools` function that `time_of_day` wraps around. In this
circumstance, we have created our own time of day extractions as needed by
location. However, for our fisher example here with a 30 minute fixrate,
including default crepuscular times should be fine.*
```{r extracting, comment =''}
# R/extract_layers.R
extract_layers
```
If you have multiple hypotheses that would benefit from know information from
both the beginning and the end of the step, you can input`"both"` instead of
`"start"` or `"end"`.
So, in the end, we get a dataset that looks something like:
```{r extracted_dat, echo=FALSE}
tar_read(tracks_extract)[1:11]
```
## Running an iSSA
This workflow is going to be an example using the Poisson point process method
described by Muff et al [-@Muff_2020]. This method allows us to look at all individuals
at once so we can examine both the individual- and population-level responses at
once. In this example, we're going to include both categorical and continuous
variables.
*NOTE: Random effects in GLMMs are supposed to have 5 or more levels, which
means you should have at least 5 individuals if you want to include them as a
random effect.*
```{r f_model_land_cover, comment =''}
# R/model_land_cover.R
model_land_cover
```
```{r summary_model_lc}
summary(tar_read(model_lc))
```
When looking at the output of this model, we see some issues. First of all,
there are some warnings from the model. The positive hessian warning is applied
liberally in `glmmTMB`, but the real sign of a problem is the `NAs` reported for
variation. This model didn't converge properly, so we need to look at why.
Things to explore at the data level include:
- lots of `NAs` for a particular variable [(helpful tips from Bolker)](https://stackoverflow.com/questions/62239351/why-am-i-getting-nas-in-the-model-summary-output-zero-inflated-glmm-with-glmmtm)
- lots of variation or very little variation in a particular variable
- availability of variables for individuals
Things to explore at the model level include:
- the model is over-parameterized for the amount of data
- the high variance is assigned at the strata level and not for another random effect
- variables are similarly scaled
- if using the Muff method, the strata represent the step id by individual
(`indiv_step_id`), not the step id that is assigned by `amt` because the way
we have it coded, the numbers restart for each individual
- categorical variables are classified as factors, especially `indiv_step_id`
- random effects have at least 5 levels (this is likely not going to be a
problem for most iSSAs, but it has come up)
```{r exploring_data}
# checking availability
# for landcover
tar_read(avail_lc)
# for distance to water
tracks_extract <- tar_read(tracks_extract)
tracks_extract[!(case_), quantile(dist_to_water, na.rm = TRUE), by = id]
```
After some exploration, I'm thinking this model is over-parameterized for 6
individuals, at least having so many landcover classes. So, I'm going to pretend
for this example that we only care about selection for forest and disturbed
habitats. We created dummy variables for this based on the extracted landcover
data and ran the iSSA on this data.
```{r f_model_forest, comment =''}
# R/model_forest_bin.R
model_forest_bin
```
```{r summary_model_forest}
summary(tar_read(model_forest))
```
This model runs without issue.
## Interpretation
Demonstrating effects from iSSA output is an area where you will see a lot of
different avenues at this point. Some just show the beta estimates from the
model and p-values. This, however can result in misinterpretation of biological significance [@Avgar2017].
### Relative Selection Strength
This is where Relative Selection Strength (RSS) comes in. Avgar et al [-@Avgar2017]
derived various equations to calculate RSS depending on if variables are on
their own, interacted with others, log-transformed, etc. However, since then
Brian Smith has determined that using the `predict()` function can achieve the
same outcome (see proof [Smith 2020](https://bsmity13.github.io/log_rss/)).
But this is all the mathematical details.
What RSSs are really doing is comparing the selection strength for habitat at location 1 (H1) compared to location 2 (H2).So, say an animal is not in a forest, are they relatively
more likely to select to be in a forest or avoid the forest? Then the RSS shows
the strength of selection or avoidance over a range of habitat values of that
habitat type.
In the targets-issa workflow, we predict H1, and H2 for forest using
the following functions:
```{r RSS_forest_ex}
# R/predict_h1_forest.R
predict_h1_forest
# R/predict_h2.R
predict_h2
# Then we can calculate RSS for forest as H1 - H2:
# R/calc_rss.R
calc_rss
```
And finally, the plots with the `plot_rss` function.
```{r rss_forest_plot}
tar_read(plot_rss_forest)
```
JWT comment: I personally find it hard to wrap my head around RSSs for
categorical variables, and I don't have a great way of explaining it for others
like me who didn't get it. The closest I can describe is, say, looking at this
plot, the individual starts not in forest, is their next step more likely to go
to forest or not. To continuous nature of the RSS is hard for me to wrap my head
around with a categorical variable. I just pretend in my head that it's a
proportional value of 'no forest' = 0 and 'all forest' = 1, and the middle is an
estimate based on the model. It's not perfect, but that's the best I've been
able to come to terms with it. :shrug: #OverlyHonestMethods
We can then do the same thing to determine the RSS for distance to water.
```{r RSS_water_ex}
tar_read(plot_rss_water)
```
<!-- #TODO insert description of the output graphs -->
RSSs have to be done one habitat at a time. That being said, interactions can be
explored by pre-determining cut off points to examine one of the variables at a
time. I know that sounds confusing. So, for example, we have selection of forest
interacted with distance to road. Say we want to understand how animals select
for forest when they are near or far from roads. So, we define being near a road
as being 100m of the road and being far from the road as 5000m from the road.
These values are semi-arbitrary, but they should be based on the biology of the
animal. This complicates the calculation of the RSS a bit, making it a two step
process with the predict method. You first have to calculate the overall
selection for, in this case, forest at 100m from the road, then you run the run
the predict function over the availability of forest and add them together.
<!-- # TODO: make up some example, there's not one relevant in the current toy model -->
```{r RSS_interaction}
```
### Speed
Speed is a different kettle of fish. Some people show box plots of step length
to give an idea of speed, but this is only a partial show of effect size (*this
is not phrased well -- fix*). When using a gamma distribution for an iSSA, the
mean estimated speed is shape parameter multiplied by the scale parameter from
the gamma distribution. However, the shape is modified by the movement variables
in the iSSA itself. So this calculation becomes
$\text{mean estimated speed} = (shape + logSL.beta + logSLintx.beta*mean(what.interacted.with))*scale$
So, from our toy example, this could be done as below. Here, we can pick which
covariate we want to vary to estimate speed to see how speed is impacted by what
it is interacted with. `seq` represents the range of values we're looking over.
```{r calc_speed_code}
# R/calc_speed.R
calc_speed
```
Let's see how fishers' speed changes in areas without anthropogenic disturbance vs areas that are disturbed (`seq = 0:1`)
```{r plot_speed}
tar_read(plot_speed_disturbed)
```