-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchapter2.Rnw
3234 lines (2671 loc) · 140 KB
/
chapter2.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%% LyX 2.2.1 created this file. For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
\documentclass{article}
\usepackage[sc]{mathpazo}
\usepackage[T1]{fontenc}
\usepackage{geometry}
\geometry{verbose,tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm}
\setcounter{secnumdepth}{2}
\setcounter{tocdepth}{2}
\usepackage{url}
\usepackage{natbib}
\usepackage[unicode=true,pdfusetitle,
bookmarks=true,bookmarksnumbered=true,bookmarksopen=true,bookmarksopenlevel=2,
breaklinks=false,pdfborder={0 0 1},backref=false,colorlinks=false]
{hyperref}
\hypersetup{
pdfstartview={XYZ null null 1}}
\usepackage{breakurl}
\include{newcom}
\begin{document}
<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
# set global chunk options
opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold')
options(formatR.arrow=TRUE,width=90)
@
\title{THE TAG LOCATION PROBLEM}
\label{chapter2}
\author{Michael D. Sumner}
\maketitle
This chapter discusses problems faced with tracking data that concern
the estimation of location and provides a flexible software
environment for exploring data and applying solutions. Examples are
used to illustrate the variety of problems and some of the limitations
of the traditional techniques by which tracking data are analysed. The
\pkg{trip} package is a dedicated suite of tools developed by the
author in the software language \proglang{R}. This package integrates
data handling and analytical techniques for track data with broader
GIS and spatial data tools. This makes track data handling tools
easily available, even for those without strong programming
skills. The chapter concludes by extending the concerns regarding
the limitations of traditional techniques to methods for deriving
locations from raw data.
% First we discuss the goals of location estimation for animal tracking
% and outline the problems involved even in ideal conditions. I discuss
% the additional problems encountered in practice, and the problems of
% imperfect data collection and logistical difficulties.
This chapter is not intended to be a critique of modern methods of
dealing with tracking data, but introduces the variety of issues
encountered and tools for applying them. Simple-to-use tools for
handling spatial and temporal data are still rare and some of the
problems encountered cause difficulties for researchers before they
have an opportunity to explore sophisticated methods. The aim here is
to illustrate some classical techniques within a software toolkit that
provides better control over the details of analysis tools for those
without advanced programming skills. Later chapters present solutions
for the remaining problems. Work by \cite{patterson2008state} and
\cite{breedthesis} provide a more critical review of recent methods.
%% Chapter2 - ensure that AIMs are specified up front:
Aims of this chapter:
\begin{enumerate}
\item{To introduce existing problems in tracking analyses presented
with examples of classical techniques. }
\item{To illustrate the complexity of problems and areas that require
more sophisticated solutions than traditional techniques. The
problems presented here illustrate the need for solutions that
come later in the thesis.}
\item{To present a flexible and readily customized software package as
a framework for classical analyses and starting point for more
sophisticated analyses.}
\item{To explain the compromises that are often made with regard to
data representation and storage, as dictated by traditional
systems, rather than an expressive model of the problem
represented by the spatial and temporal data. }
\item{To encourage the use of techniques for automatic data
validation, spatial and temporal data storage and integration with
database and GIS technologies.}
\end{enumerate}
%To explain the compromises that are often made in terms of data
%storage and computation, rather than in terms of the modelling
%problems presented by spatial and temporal data.
\section{\proglang{R} and the \pkg{trip} package}
The software package \pkg{trip} developed with this chapter provides
an integrated system for data validation and a development framework
for track analyses. This can be used as a launching point for further
analysis such as validating input to Bayesian methods, or filtering
for state-space models \citep{patterson2010using}. As an extension of
the \proglang{R} environment, \pkg{trip} also provides integration
with tools for data access and output, integration with GIS and other
data systems, and metadata for projections and coordinate systems. The
\pkg{trip} package ties together numerous tracking analysis
techniques, which previously were only available though a wide variety
of disparate tools, each having various requirements and limitations.
% The
% \pkg{trip} package was developed because not everything required by
% tracking analysis was available in one place, with many disparate
% tools having very different requirements and other limitations.
The \pkg{trip} package was developed within the freely available
software platform \proglang{R}, a statistical programming environment
consisting of a vast community of contributing developers and users
\citep{R}. \proglang{R} is organized into modules known as
\emph{packages} which provide the functionality of the language, and
also the mechanism by which it is extended\footnote{See
\url{http://en.wikipedia.org/wiki/Package_Development_Process}.}. New
packages are created using the same tools by which R itself is built
and can be contributed to a public repository such as the
Comprehensive R Archive Network (\texttt{CRAN}\footnote{See
\url{http://en.wikipedia.org/wiki/Software_repository}.}). The
repository system for contributed packages is one of the great
strengths of \proglang{R} and is part of the reason for its ease of
use and subsequent popularity. The spatial and temporal capabilities
of \proglang{R} are advanced, including strong integration with other
technologies such as databases, GIS software and the wide variety of
spatial and other complex data formats.
The data coercion paradigm in \proglang{R} is very powerful, allowing
rather different packages to share data with tight integration. There
are some fundamental data representations required for spatial
analysis and careful organization is needed to provide coercions
between different types to get all the tools to work together. The
spatial infrastructure used to create the \pkg{trip} package is
described with examples of using the software in Section
\ref{sec:tripdemo}.
%Chapter2 - ensure that AIMs are specified up front:
% 1. aim to introduce what is to come later in thesis, in terms of
% existing problems and what can and cannot be done
% 2. trip as a framework for classical analyses and launchpad for more
% sophisticated (e.g. validated input to SS models)
% - validation, launching point
% - data integration, data access
% - data output
% - development framework
% - projections/coordinate systems/ GIS
% I explore
% some of the existing approaches and the types of studies with the aim
% of determining the general features of all, and how many of the
% existing techniques only partly solve some problems. I talk about the
% under-use of GIS and projections in tracking applications, show how
% some difficulties could benefit from a wider understanding, and
% finally outline the need for an integrated approach to tracking
% problems with full models of both animal movement and data collection.
\section{Problems with location estimation}
\label{sec:locationproblems}
This section presents actual tag location estimates to illustrate
common problems associated with track data. The location data were
provided by System Argos, which is a worldwide satellite service that
provides location estimates from small mobile transmitters.
The first example is a sequence of Argos position estimates for a
single elephant seal in Figure~\ref{fig:RawArgos}. All raw estimates
provided are shown, with the longitude and latitude values transformed
to an equal-area projection and drawn connected as a line. While there
is obvious noise in the track, the general sequence of events is
clear: the seal leaves Macquarie Island, swimming predominantly
south-east to the Ross Sea where it spends several weeks, and then
returns via a similar path in reverse.
<<print=FALSE,echo=FALSE>>=
suppressMessages(library(mgcv))
invisible(capture.output(library(deldir)))
invisible(capture.output(library(spatstat)))
library(foreign)
library(lattice)
library(sp)
library(trip)
suppressMessages(library(rgdal))
suppressMessages(library(spatstat))
invisible(capture.output(library(maptools)))
suppressMessages(library(geosphere))
@
\begin{figure}
\begin{center}
<<echo=FALSE,print=FALSE,fig=TRUE>>=
load("RawArgos.Rdata")
cols <- bpy.colors(length(cols), cutoff.tails = 0.2)
lmat <- matrix(1, 3, 3)
lmat[1,3] <- 2
layout(lmat)
plot(crds, type = "n", axes = FALSE, xlab = NA, ylab = NA, asp =1)
bg.col <- "white"
usr <- par("usr")
rect(usr[1], usr[3], usr[2], usr[4], col = bg.col, border = NA)
plot(world, add = TRUE, col = "grey")
plot(macq, add = TRUE, col = "black")
plot(grat, add = TRUE)
for (i in 2:nrow(crds)) {
lines(crds[(i-1):i, ], col = cols[i-1], lwd = 2)
}
aa <- legend(x = -1118299.11849875, y = -566890.365414213,
legend = format(round(seq(min(tr$gmt), max(tr$gmt), length = 8), "days"), "%d %B"),
col = bpy.colors(7), lwd = 2, bg = "white", title = " date (1999)")
r <- matrix(bpy.colors(7, cutoff.tails = 0.2), 7, 1)
rasterImage(r, aa$rect$left + aa$rect$w/12, aa$rect$top - aa$rect$h + aa$rect$h/11,
aa$text$x[1] - aa$rect$w/12, aa$rect$top - aa$rect$h/7)
box()
text(-1104281, 950000, "Macquarie Island")
text(177346.2,-1354524, "Ross Sea")
op <- par(mar = c(0, 0, 4.1, 2.1))
bb <- bbox(world) + matrix(9e6 * c(1, 1, -1, -1), 2)
plot(0, type = "n", xlab = NA, ylab = NA, xlim = bb[1,], ylim = bb[2,], axes = FALSE)
usr <- par("usr")
rect(usr[1], usr[3], usr[2], usr[4], col = bg.col, border = NA)
plot(world, axes = FALSE, xlab = NA, ylab = NA, xlim = bb[1,], ylim = bb[2,], col = "lightgrey", add = TRUE)
plot(grat10, add = TRUE, col = "grey")
box(lwd = 2)
for (i in 2:nrow(crds)) {
lines(crds[(i-1):i, ], col = cols[i-1], lwd = 1)
}
par(op)
## crds1 <- coordinates(tr[tr$seal == unique(tr$seal)[3], ])
## #plot(project(crds1, p4), type = "l")
## plot(crds1, type = "l")
## for (i in 2:nrow(crds1)) {
## xy <- crds1[(i-1):i, ]
## pg4 <- paste("+proj=", "gnom", " +lon_0=", xy[1,1], " +lat_0=", xy[1,2], " +over", sep = "")
## xy <- project(xy, pg4)
## xy <- project(cbind(seq(xy[1,1], xy[2,1], length = 20), seq(xy[1,2], xy[2,2], length = 20)), pg4, inv = TRUE)
## #xy <- project(xy, p4)
## lines(xy, col = "red")
## }
@
\end{center}
\caption{Raw Argos estimates for a Macquarie Island elephant seal.
The line connecting the points is coloured from blue through purple
to yellow in order relative to the time of each position. (Macquarie
Island can just be seen in black beneath the third dark blue line
segment). The outward and inward journeys are much faster than the
journey throughout the Ross Sea, as shown by the colour scale
change. A graticule is shown for scale, 5 degrees on the main plot,
and 10 degrees on the inset. }
\label{fig:RawArgos}
\end{figure}
% \begin{figure}[!ht]
% \begin{center}
% \includegraphics[width=120mm]{roughfigures/rawC993Argos.png}
% \end{center}
% \caption{
% {\bf Raw Argos estimates joined by a line}
% }
% \label{FigrefRawArgos}
% \end{figure}
There are a number of problems with the location estimates, some that
are very obvious, but others that are more subtle. First, some of the
dog-legs in the path seem very unlikely.
%% Perhaps index these on the figure?
On the outward journey the blue path shows some lateral movement to
the west and east, and just before the return to Macquarie Island
there is a similar movement to the west, then east. These are obvious
problems seen as noise in the track, with positions that cannot be
believed that do not otherwise obscure the general pattern of
movement. Other dog-legs in the path are less extreme, so are more
plausible.
Another problem is that there are locations that are well within the
land mass of Antarctica. For a marine animal, these locations are
clearly incorrect but as with the track dog-legs there are similar
issues with levels of plausibility that are difficult to evaluate. A
location on land may be plausible if it is near enough to the coast,
though this can interact with further issues requiring
interpretation. These include that the start and end location are not
exactly at the known ``release'' and ``re-capture'' sight, which was
the isthmus at Macquarie Island. This isthmus is quite narrow and is
readily crossed by elephant seals, though regions of the island to
either side of the isthmus can only be traversed by sea. Another
section of the track at the beginning has the path of the animal
crossing Macquarie Island itself. At the scale of this image this
inaccuracy seems unimportant since the island is such a small area
within the study region. However, if the region of land concerned was
a much larger peninsula then the problem of choosing which region was
actually visited remains.
A scheme that proposes to remove or correct extreme positions faces
the problem of defining appropriate thresholds. ``Extreme'' dog-legs
cannot simply be discarded as the question of what is ``too-extreme''
does not have a simple answer. A simple rule to discard any location
on land will not work since these animals do actually visit
coastal regions. The distances that a seal might travel inland is not
very far but depending on the species studied and the environment the
situation may not be so clear-cut.
There are other issues of plausibility. For example, the coastline in
the figure is quite coarse and while it may be sufficient for the
scale of the image it does not represent the actual coastal boundary
available to the seal. The real coastline is far more tortuous,
detailed and dynamic---and may be significantly different from a
particular data set due to the current fast- or sea-ice cover. This is
a general issue with any data set available for informing models of
animal location---the assumptions and limitations must be understood
and used appropriately.
In terms of the incorrect first and last positions in Figure
\ref{fig:RawArgos}, these could be updated to be the actual release and
recapture sites, but it might be more correct to actually add those
locations and times to the start and end of the sequence of
records. This is a data consistency issue that leads to the next
family of problems in track data.
%[We could begin describing a model of what data sources are acceptable
%to inform what we know about true location]
%% MH: I quite like this idea....but lets leave it to one side for now
\subsection{What is a trip?}
%% \subsection{Terminology for tracking data}
%% Underlying the actual track data that we work with is an idealized
%% model of the actual path of the animal. This is a continuous line in
%% three dimensional space parameterized by time.
There are a number of practical issues associate with the organization
of track data that can present more prosaic problems. This section
discusses some terminology and suggest the use of a ``trip'' as the
unit of interest and that can be defined with database-like validation
restrictions. The idealization of an animal's trip is a continuous
line in four dimensional space-time that is perfectly accurate as a
representation of the seal's position. For practical reasons, this
ideal case can only be represented as a series of point samples of
uncertain accuracy, in two or three spatial dimensions parameterized
by time.
Animal tracking can be carried out in a variety of ways, here
restricted to the broad class of data derived from ``tagging''. A
``tag'' is a device attached to an animal that is used to directly
sense and record data or that is indirectly detected by a remote
sensing system. For the purpose of the current discussion, refer to
the first type of tag as ``archival'' and the second type as
``remotely sensed''. Reviews of the practical methods for tagging in
the broader context of biotelemetry for marine and terrestrial species
are provided by \cite{Cooke2004}, \cite{WGSK02} and
\cite{Kenward:VHF}.
Archival tags record and store data that is later retreived from the
tag, while remotely sensed tags emit an electronic or acoustic signal
that is detected by an installed system of sensors, such as a
satellite or acoustic array. (This categorization is not always
maintained in practice, as archival tags may be satellite-linked in
order to upload data in near real-time, but for the purpose of
location estimation the distinction holds for the types of available
data).
A loose set of definitions then is:
\begin{description}
\item[tag] {the device put on the animal.}
\item[track data] {any temporally referenced location data resulting
from a device attached to an animal.}
\item[trip]{ a specific tracking ``interval'' where a tag was attached,
the animal is released for a time, and the tag (or just its data)
is eventually retrieved. A trip may be represented in
some way with track data that has some quality control or
modelling applied.}
\end{description}
Data resulting from the tagging process are identified by the device
ID, as well as by the identity of the individual animal the tag is
attached to. The same tag might be put on different animals, or the
same animal might have been tracked on two separate occasions. For
central-place foragers, a single tagging event may involve the same
animal leaving the tagging site and returning multiple times. Once the
interval of interest is defined it can be regarded as a trip. A single
leave / return or a tag deployment / retrieval may be referred to as a
trip. Whether multiple leave / return events are considered within a
single trip depends on the research question---migratory animals
usually won't have such clear trip boundaries as central-place
foragers, for example. The difference is important as the behaviour of
interest for central-place foragers is primarily between leave /
return events, and the return event is usually the only opportunity to
retrieve a tag. Finally, there may not be data for the entirety of
the trip of interest due to tag loss or memory restrictions, and so
require the inclusion of trip sections where location is uncertain or
completely unknown.
For the current discussion, define a trip to coincide with the
interval of interest for which there is useable data from the tagging
process. ``Tracks'' or ``track data'' then are just a set of location
and other data, with no particular organization or quality control.
\subsection{Practical data issues}
\label{sec:tripdef}
The minimum organization and quality control for trip data involves
the ordering and relation between data records. The ordering of
records is perhaps inconsequential, as there is the inherent order of
the date-time value stored for each location, but this may reveal more
basic errors in the data. There must not be occurrences of duplicated
date-time records within a single trip, although duplicated locations
in subsequent records are acceptable. Duplicates in time do not make
sense since either they are redundant copies of a previous record, or
there is an implied infinite speed. These are common from the Argos
Service, an example is found in the \texttt{seal} data example given
by \cite{argosfilter} which is used in Section
\ref{sec:extendingtrip}.\footnote{Another recent example of duplicated
times in a GPS data set is discussed here:
\url{http://lists.faunalia.it/pipermail/animov/2010-August/000635.html}}
Analytical methods sometimes apply a non-zero time difference
arbitrarily to avoid divide-by-zero errors. Less serious is the issue
of successive duplicate locations, but care must be taken when
calculating metrics such as speed based on inter-point distances. Each
of these cases should be investigated carefully in case they hide
errors from other causes such as mistaken data handling.
Missing values must also be handled carefully. Location and time
coordinates cannot be used if they are missing or non-finite, even if
their record appears in the correct order. Missing values can arise in
a number of ways---infinities or undefined numeric values from numeric
errors, or out of bounds coordinates, transformation errors, data
missing from a regular sequence---and the exact reasons need to be
carefully understood.\footnote{A natural assumption is that recorded
values of date-time are correct beyond question: so there is some
information even if one of the spatial coordinate values is
missing. This issue is a corollary to the use of filtering
techniques that remove locations from track data or otherwise
correct spatial positions. If there is a date-time why not
interpolate or otherwise estimate missing spatial coordinates?}
This is a different approach taken to that of
\cite{calenge2009concept} who explicitly allow missing coordinates as
part of ``trajectories''. This is most pertinent in the context of
tracks of regular time intervals where a missing point can be
significant in terms of interpretation. The definitions here are not
intended to determine which approach is more appropriate and there is
no reason the two rationales cannot co-exist, but the current
implementation in the \pkg{trip} package disallows missing
coordinates.
From a programming perspective, the use of rigid classes (definitions)
with validity checking can significantly reduce the time wasted
solving these problems \citep{chambers1998programming}. Based on the
above, the minimal data-consistency preparation required can be
achieved in the following way. Read all records, sort by trip ID then
date-time, remove duplicated records or records with missing or
non-numeric spatial or temporal coordinates. (The definition of
``invalid'' for a coordinate may involve out of bounds values such as
those for longitude and latitude, but this step only refers to the
data values, not their interpretation). Remove or adjust any records
with duplicate date-times within a single trip ID. Up to this point no
interpretation has been applied to the data---this will provide a
useable set of records that can pass minimal validation but each step
should be carefully investigated to ensure that automated decisions
are not introducing new errors.
One way to adjust duplicate times is to simply modify the values
forward or back by a small amount, but this can be problematic
depending on the time differences involved. The reason for duplicated
times is more likely to be a problem with the data itself and should
be investigated.
Other problems in this regard deal with the sensibility of movements
in a particular coordinate system. The most commonly used coordinate
system for tracking data is longitude and latitude on the WGS84
datum. For animals that traverse hemispheres and cross critical
meridians such as the Pacific Ocean dateline (longitude 180 W / 180 E)
or the prime meridian (longitude 0) a continuous path must be
represented appropriately, such as longitudes in [-180, 180] or [0,
360 ] respectively. Many species will cross both these critical
boundaries and so representing simple lines requires a smarter choice
of map projection. All map projections have these regions of
non-optimal usage and so the focus should be on intelligent choice of
projection using tools that provide easily applied transformations.
\subsection{Joining the dots}
\label{sec:joindots}
A further problem is the common practice of joining points with
``straight lines''. Usually the only available data are temporally
referenced point locations, and lines are artefacts introduced for
visual purposes. However, using these lines is quite artificial, and
can become error prone when used quantitatively. Joining the points
imposes a specific model of behaviour, namely that the path is a
straight line between points.
This is not correct on several levels. First, the animal is moving in
three spatial dimensions not two, and the movement in this third
dimension is quite significant for diving animals, though it may be
largely ignored for many flying or surface dwelling species. Second,
even if the points represent accurate positions for the animal the
line joining them most certainly does not represent the intermediate
path correctly. The animal could be traversing either side of the
line, or taking a far longer, more convoluted path. Thirdly, the
coordinate system used to interpolate the intermediate positions can
have a large effect on the outcome. ``Straight-line'' movement is
usually assumed, but what is drawn as a straight line on a map has a
very different meaning depending on the coordinate system or map
projection used. For most coordinate systems shorter step lengths will
be closer to the ``great circle'' path, but the nature of the
deviation will also depend on the region traversed.
Joining points with a line guides the eye helpfully to show the
sequence of points, and the mind can often overlook problems of
inaccuracy to see roughly what actually happened. It is this mental
capacity for reducing noise and seeing the overall picture of events
that sophisticated models of track analysis and movement aim to
replicate in an objective way. When our minds provide these ideas they
do so by applying knowledge of the physical system: an animal swimming
through water over great distances, an animal that will tend to travel
quickly to an area of interest, then spend time in localized regions,
an animal that will not venture far from the coastline to areas
inland, etc. ``An effective EDA [Exploratory Data Analysis] display
presents data in a way that will make effective use of the human
brain's abilities as a pattern recognition device''
\citep{maindonald2007data}.
There is no end to this problem when dealing with points or lines
segments themselves as the entities of interest. If a particular
position is just slightly wrong, and its neighbouring points also a
little inaccurate then any assessment of the distance from one point
to another or the intermediate path taken between points is thrown
into question.
\subsubsection{Treatment of spatial and temporal data in modern software}
\label{sec:software}
The temporal nature of track data stems from the fact that the
physical process of animal movement is a continuous path. This
continuous process is only measured by discrete samples and so the
data are inherently discontinuous. However, treatment of time in
software is rarely truly continuous but rather applied as a sequence
of ``time slices''. This is a legacy limitation that unfortunately
matches the way in which track records are usually measured and
stored. To choose a high-profile example, animations of tracks in
Google Earth \citep{googearth} show sequences of discrete line segments
that are progressively revealed or hidden as the slider intersects the
time spans of the segments. Why is the line not represented as a
continuously changing entity, with extents that match the slider's
extent exactly? Partial line segments could be shown, and the line
shown continuously without being so bound to its input points. This is
a problem not only for Google Earth but a true limitation in the
representation of most spatial data in GIS and GIS-like
visualizations.
This must be part of the reason why tracking analysis is rarely
tightly coupled with GIS---analytically (if not visually) track data
is treated as continuous or near-continuous, with more information
than the static line segments joining subsequent points. Also track
data is routinely processed based on great circle travel (assuming
that's how the animal would choose to move) but then presented
visually in a simple 2D longitude by latitude plot. Map projections
provide visualizations that help prevent our brains from giving us the
wrong message about distance and area on a simple plot. Ultimately a
4D visualization on a globe may be a ``truer'' way to visualize track
data, but though current tools such as WorldWind and Google Earth will
draw lines along great circles they are not well suited to track data
that varies continuously in time.
GIS traditionally provides complex shapes such as multi-segment lines
with multiple branches, or polygons with branched holes and islands
but support for a third coordinate value for elevation is rare and
time is usually non-existent.\footnote{Polygons are literally
incapable of representing continuous 2D topological surfaces in
3(+)D geometric space and the special status of planar polygons that
imposes this limitation surely will eventually be transcended by
future GIS technology.} Though routine computer graphics in games
provides complex surfaces and lines composed of primitive elements
with incredibly complex representations and interactions, it is rare
to find treatment of track data as a multi-part line object, let alone
with fine control over the continuous span of a single line. Modern
GIS in its most commonly understood terms is not easily extended for
temporal data, but provides an excellent platform for dealing with
data sources, geometry and gridded data, and working with projections.
The availability of data manipulation and analysis tools is a major
factor in the effective use of animal tracking data for ecological
studies. While there are many analytical techniques and a wide array
of software applications, some lack the flexibility required or are
restricted by cost or the required expertise to use them
effectively. For some purposes track data needs to be represented as
points, and for others as lines, or even better as probability density
functions. Tools for seamless conversion between these data structures
are practically non-existent for everyday research.
An illustrative example of the limitations of GIS data structures is
seen when attempting to represent track data. As points, the geometry
is stored as X and Y coordinates (and, rarely, with Z
coordinates). Time is relegated to the attribute table and even then
is not always supported completely by common GIS interchange
formats.\footnote{The obscure ``measure'' value for a fourth
coordinate in shapefiles is sometimes used for time, but was not
designed for it and is rarely supported by software packages.} GIS
supports more complex geometry than simple points requiring more than
one vertex: lines, polygons and ``multipoints''. It should be simple
to store a track in either a ``point'' or ``line'' version, but for
lines each line segment is composed of two vertices so there is no
longer a simple match between a point's date-time (or other)
coordinate and those of the line. The line is represented as a single
object with multiple X and Y vertices with only a single attribute
record, or as a series of line segments composed of two X and Y
vertices each. Neither version provides a clean translation of even
very simple track data to a GIS construct.
Access to the true continuous nature of track data is simply not
provided by common software tools. This is a problem that extends to a
general definition of topology versus geometry, for representing
objects flexibly in a chosen space but discussion of that is out of
scope here. \cite{hebblewhite2010distinguishing} highlight the need
for ecologists to become more adept at matching temporally varying
environmental data to animal movement data. There are emerging
technologies that allow for a separation of geometry and topology,
unlimited coordinate attributes on GIS features, and generalizations
of this are within the scope of general database theory
\citep{geojson,beegle-pelagic,pauly-keeping,anderson2010voyager,fledermaus}.
%also
%\url{https://www.ivs3d.com/news/PID985675.pdf}
%\url{http://proceedings.esri.com/library/userconf/proc06/papers/papers/pap_1848.pdf}
%\url{http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.167&rep=rep1&type=pdf}
%\url{http://books.google.com.au/books?hl=en&lr=&id=g1puIWGZmWEC&oi=fnd&pg=PA3&dq=related:4bBs6wCXz0gJ:scholar.google.com/&ots=iRrhdBkLs7&sig=EQAMrCJwmXvBADuX4Qb9X8ial2g}
% - lack of non xy coordinates
% - limited use of spatial statistics constructs like ``window''
% - CRAN package asbio
% A final listing of some available software tools? [Software for
% available Filters are mentioned in Destructive Filters below, also in
% gridding methods] (see comments in tex)
%% see here for stuff on scale problems:
% Review
% Distinguishing technology from biology:
% a critical review of the use of GPS
% telemetry data in ecology
% Mark Hebblewhite1,* and Daniel T. Haydon2
% ``We now have incredibly fine-scale data on animal movements, but lack
% data at the same resolution (i.e. grain size) about what resources
% were available to them or their behavioural state in a similarly
% fine-scale way. Instead, we often build sophisticated models to ‘test’
% between the importance of ‘habitat’ and other factors driving
% movements of animals where we pair data on hourly movements with a
% coarse-grained and static permanent ‘map’ of landcover resources
% (Dalziel et al. 2008; Frair et al. 2010). Ecologists should become
% better in matching temporally varying estimates of resource
% availability at the same time scale as animal movements. While a
% daunting task, such data are available at some finer grain sizes. The
% availability of fine temporal (8 day) and spatial (250 m2) remotely
% sensed data from satellites such as MODIS (Moderate Resolution
% Infrared Satellite; Huete et al. 2002) now provide ecologists with
% ready information on forage biomass, terrestrial and aquatic net/gross
% primary productivity, and snow cover that can be matched temporally
% with GPS data (Huete et al. 2002; Running et al. 2004; Hebblewhite
% 2009). Urbano et al. (2010) describe sophisticated database management
% systems to help ecologists link animal and environmental data. The
% power of coupling of satellite technology on animal movements together
% with resource availability is self-evident, but, as yet, relatively
% few studies have attempted to harness it.''
% See this new paper:
% %\url{http://www.research4d.org/publications/HabitatSpace.pdf}
% and related
% %\url{https://abstracts.congrex.com/scripts/jmevent/abstracts/FCXNL-09A02a-1710252-1-cwp4c02.pdf}
% [Hengl's resource, Animov, adehabitat, MoveBank, SeaTurtle.org, etc.]
% Existing tools and integration with GIS.
% Software survey - Matlab libraries, MamVis, R-Sig-Geo, adehabitat,
% AniMov.
% argos-tools
% adehabitat
% trip
% argosfilter
% track-add-in Manifold
% Eonfusion
% [Scripts in IDL at AAD, timeTrack]
% tools
% formats
% files
% databases
% %% under-use of this basic knowledge, need for database models, better
% %% data storage, class definitions and validation
% What does traj do? How is it handled, particularly in light of sp?
% Talk about how Austin, Douglas etc. and co. do some of this - but not
% very accessible. How did DJW do it?
% I am compiling a list of available software (both proprietary and open-source) that analyzes animal movements. I was wondering if list members know of software that I am not aware of.
% So far I am aware of the following R packages:
% adehabitat
% BBMM
% Crawl
% tripEstimation
% And the following free software:
% Hawth's Tools
% Alana Ecology Home Range Tools for ArcGIS
% Animal Movement Analysis ArcView Extension
% QGIS
% trip (I'm the author), argosfilter and diveMove packages also have some movement functions / support for track data.
% Here is a rough list I have collected, let me know if I can provide more detail for any of it.
% ------------------------------------------------
% Matlab libs
% (e.g. IKNOS: http://bio.research.ucsc.edu/people/costa/people/tremblay.html)
% MamVis http://www.smru.st-andrews.ac.uk/MamVisAD/
% STAT (Satellite Tracking and Analysis Tool)
% www.seaturtle.org/tracking/STAT_biologging2.pdf
% Vilis Nams has some software
% http://nsac.ca/envsci/staff/vnams/index.htm
% Argos Tools for ESRI
% http://www.spatial-online.com/ARGOSTools.htm
% Eonfusion www.eonfusion.com
% (not track-specific but has very general data structures, allow continuous-time tracks and general multi-dimensional data)
% Track add-in for Manifold System (creating lines from points in GIS):
% http://forum.manifold.net/forum/t67287.2
% Various tag manufacturers have software for their tags,
% e.g. Wildlife Computers, Lotek, SMRU, Sirtrack, Vemco, Microwave Telemetry, etc.
% Sascha Frydman's AT SEA (a ref to it is here, I cannot find anything else):
% http://www.smru.st-andrews.ac.uk/Biologging/Abstractbook_final.pdf
% Argos Tools
% (There was an old site for "argos-tools" that I cannot find now)
% There is a set of IDL code, originally from Dave Watts at the Australian Antarctic Division
% (used to be here under "Geographic Information Systems", I can dig it up and the user guide if need be: http://www.zoo.utas.edu.au/awru/AWRU1020.htm)
% http://www.anatrack.com/
% Sites
% -----------------------------------------------
% seaturtle.org
% movebank.org
% The Ocean Tracking Network, http://www.oceantrackingnetwork.org/index.html
% http://www.soest.hawaii.edu/PFRP/overview.html
% http://www.ccom.unh.edu/vislab/index.html
% General
% ------------------------------------------------
% GMT
% SeaMap %\url{http://seamap.env.duke.edu/}
% %\url{spatial-analyst.net}
\subsection{Summary of problems}
The main problems can be described as a set of overlapping issues:
\begin{description}
\item[Inaccurate sampling] Position estimates are inaccurate, with
some unknown relation to the true position.
\item[Irregular and incomplete sampling] Position estimates represent
discrete samples from an animal's continuous path. These may be at
irregular time intervals with large gaps in the time series, and no
ability to control this because of practical limitations.
\item[Incomplete paths] Paths composed of too few positions,
inconsistent motion and assumptions about straight line movement.
\item[Unlikely dog-legs] There is no sense in the animal being so
erratic.
\item[Simplistic models of movement and residency] Intermediate locations are shown by joining the
dots, using an assumption of direct linear motion between estimates.
\end{description}
Many traditional analyses of modern track data deal with these
problems by chaining a series of improvements in an \emph{ad hoc} way,
and the need for better approaches is well understood
\citep{breedthesis,patterson2008state}. Incorrect positions are
removed by filtering, based on speed, distance, angle, attributes on
the location data or spatial masking. Positions are updated by
correction with an independent data source, such as sea surface
temperature (SST) or the need to traverse a coastal boundary. Unlikely
dog-legs are removed by filtering, or ``corrected'' by smoothing the
line. Smoothing is also used to augment small samples, by
interpolating along a smooth line, or smoothing positions into a 3D
grid partioned by time. There are further requirements for smoothing
to estimate latent locations or to match disparate spatial and
temporal scales.
Many of these techniques have their own problems, compounded when
these operations are chained one after the other. Models of the
process may be overly simplistic (linear movement between estimates),
or applied inconsistently---positions are removed, then estimates are
smoothed, or compared with other data to correct or update them. Later
chapters present new methods for incorporating these issues in a more
integrated way.
\section{Summarizing animal behaviour from point-based track data}
This section revisits some of the problems presented previously and looks at the
details of algorithms used. The techniques are useful for first-pass
summaries, or exploring ideas, but they rely on simplistic models and
are difficult to integrate sensibly.
Putting aside the limitations mentioned earlier and the fact that
there is no clear basis for deciding which combination of tests should
apply, some of the issues can be illustrated further by proceeding
with these simple to more complex filters.
\subsection{Filtering }
Filtering is used to remove or modify data in some way based on
metrics available or calculated from the track data. Destructive
filters categorize locations for removal from the
trip. Non-destructive filters update the location data for some
positions. Again there is no clear distinction between these two types
as a filter can be used to discard some locations entirely, update
others and interpolate new locations for various purposes.
At the simplest level, destructive filtering involves categorizing
each point for removal or retention. An example is a ``land mask''
that would deal with the issue of the points on the Antarctic
continent as discussed in Section~\ref{sec:locationproblems}. A land
mask filter would determine any points that fall on land, mark them
for removal and discard them. The filter is very simple as it is a
basic check for each point individually, with no interaction of its
relationship to other points or other data sources. All points that
fall on land can be determined and removed in one step, or they could
be checked one after another in any order. The way the filter is
applied will have no impact on the filtered result.
A more complex case applies recursion, where once some points are
removed the status of the test for remaining points may have changed
and so must be determined again. Metrics on sucessive locations
fundamentally rely on the order and relation betweeen points, and so
once points are removed the calculations must be repeated until the
desired metric is reached for all retained points. Existing filters
apply measures such as Argos location quality, distance between
successive points, speed of movement, turning angle and land masks. A
classic speed filter in marine animal tracking is a recursive rolling
root-mean-square speed filter by \cite{MCF92}. This filter is widely
used and widely cited especially in marine applications.
% Some measure of
% confidence is sometimes available in the to begin with and so the
% simplest filter is to simple select values that are within the
% required range. Probably the most commonly known example is the
% ``location class'' given to Argos Service estimates. These are
% provided with one of seven category labels that represent a measure of
% confidence in the position.(There are a variety of extra sources of
% information available for Argos estimates that will not be discussed
% here, these examples merely aim to illustrate the principles
% involved.)
There is a practically endless number of metrics that can be derived
from the location data that range from the very simple to
complex. However, no matter what combination of decisions are applied,
the main limitation of these methods is their arbitrary nature. They
are applied to a purely geometric interpretation of the data that
laregely ignores the physical system being modelled. Much information
goes unused, and what data is used is applied independently of other
sources.
The use of destructive filters is also problematic because data is
discarded and the filter decision is based on the location itself,
rather than the process used to estimate the location. It is hardly
ever mentioned, but the Argos Service estimation process is not
published and therefore not amenable to modelling.
Recursive filters are relatively complicated, but still result in a
categorical decision as much simpler filters like a land mask---there
is no single number that can be calculated for a given point, and the
implications of minor decisions for a given filter can greatly affect
the result.
\subsubsection{Destructive filtering}
\label{sec:destructivefilter}
Here the use of a two types of destructive filter are demonstrated to
remove points based on a land mask, Argos quality and speed
calculations. In Section~\ref{sec:tripdemo} the \pkg{trip} package
is used to create a version of a speed-distance-angle filter.
The Argos Service is a global satellite data collection system that
provides an estimate of position based on doppler-shift signals
transmitted from tags \citep{Argos:MAN}. The basic location product is
a time series of longitude and latitude values with a categorical
measure of location quality that is provided as a label. There is more
information available with the service and guidelines for its use, but
the scope of the following discussion is restricted to the widely used
quality class measure. Location classes take the values ``Z'', ``B'',
``A'', ``0'', ``1'', ``2'', or ``3'' in order of increasing
accuracy. The reported accuracies are 1000 m for ``0'', 350-1000 m for
``1'', 150-350 m for ``2'', and better than 150 m for ``3''
\citep{Argos:MAN}. No estimate is given for ``Z'', ``B'' or ``A''
classes although studies have shown that these can have an acceptable
accuracy \citep{Vincent2002}.
% - class, speed, angle, distance, bbox, polygon/raster masks
%Our simplest filter might then be ``discard any location with a class
%poorer than $x$, where $x$=ZBA0123''.
The first filter removes any points that fall on land, and then any
points that have an Argos class of ``B'' or worse. In
Figure~\ref{fig:LandClassFilter} the two panels show the Argos track
plotted in longitude and latitude coordinates.
\begin{figure}
\begin{center}
<<print=FALSE,echo=FALSE,fig=TRUE>>=
load("RawArgos.Rdata")
tr <- tr[tr$seal == "c026", ]
crds <- coordinates(tr)
sf <- speedfilter(tr, max.speed = 12.5)
crds.f <- crds[sf, ]
w <- spTransform(world, CRS("+proj=longlat"))
tr$lq <- ordered(factor(tr$lq, c("Z", "B", "A", "0", "1", "2", "3")))
## update 2019, overlay was long removed from sp
lf <- is.na(over(SpatialPoints(crds, proj4string = CRS(proj4string(w))), as(w, "SpatialPolygons")))
crds.lf <- crds[lf, ]
qf <- tr$lq > "B"
op <- par(mfcol = c(1,2))
## PLOT the raw data, then colours to show filtered track
plot(crds, type = "n", main = "land filter", xlab = "longitude", ylab = "latitude")
plot(w, add = TRUE, col = "grey")
lines(crds, lty = 2)
text(175, -60, paste(sum(!lf), "points \nremoved"))
cols <- bpy.colors(length(cols), cutoff.tails = 0.2)
for (i in 2:nrow(crds.lf)) {
lines(crds.lf[(i-1):i, ], col = cols[i-1], lwd = 2)
}
crds.qf <- crds[qf, ]
plot(crds, type = "n", main = "class filter", xlab = "longitude", ylab = "latitude")
plot(w, add = TRUE, col = "grey")
lines(crds.lf, lty = 2)
cols <- bpy.colors(length(cols), cutoff.tails = 0.2)
text(175, -60, paste(sum(!qf), "points \nremoved"))
for (i in 2:nrow(crds.qf)) {
lines(crds.qf[(i-1):i, ], col = cols[i-1], lwd = 2)
}
par(op)
@
\end{center}
\caption{Land filter and Argos quality class filter (> B). In each