-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathREADME.Rmd
161 lines (126 loc) · 4.84 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.align = "center",
fig.path = "man/figures/README-",
echo = TRUE,
fig.width = 8,
fig.height = 6
)
```
# trendbreaker
<!-- badges: start -->
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3888494.svg)](https://doi.org/10.5281/zenodo.3888494)
[![R build status](https://github.com/reconhub/trendbreaker/workflows/R-CMD-check/badge.svg)](https://github.com/reconhub/trendbreaker/actions)
[![Codecov test coverage](https://codecov.io/gh/reconhub/trendbreaker/branch/master/graph/badge.svg)](https://codecov.io/gh/reconhub/trendbreaker?branch=master)
[![R-CMD-check](https://github.com/reconhub/trendbreaker/workflows/R-CMD-check/badge.svg)](https://github.com/reconhub/trendbreaker/actions)
<!-- badges: end -->
The *trendbreaker* package implements tools for detecting changes in temporal
trends of a single response variable. It implements the **A**utomatic
**S**election of **M**odels and **O**utlier **De**tection for **E**pidemmics
(ASMODEE), an algorithm originally designed for detecting changes in COVID-19
case incidence.
ASMODEE proceeds by:
1. defining a training set excluding the last *k* data points
2. identifying the temporal trend in the training set by fitting a range of
(user-specified) models to the data and retaining the best predicting /
fitting model
3. calculating the prediction interval (PI) of the temporal trend
4. classifying any data point outside the PI as outlier
The algorithm can be applied with fixed, user-specified value of *k*, so as to
monitor potential changes in this recent time period. Alternatively, the optimal
value of *k* can be determined automatically.
**Disclaimer:** this is work in progress. Please reach out to the authors before
using this package. Also note this package may soon be renamed to avoid clashes
with other projects and reflect a more general scope.
## Getting started
Once it is released on [CRAN](https://CRAN.R-project.org), you will be able to
install the stable version of the package with:
``` r
install.packages("trendbreaker")
```
The development version can be installed from [GitHub](https://github.com/) with:
``` r
if (!require(remotes)) {
install.packages("remotes")
}
remotes::install_github("reconhub/trendbreaker")
```
The best place to start for using this package is to read the documentation of
the function `asmodee` and run its example:
```{r eval = FALSE}
library(trendbreaker)
?asmodee
example(asmodee)
```
## Main features
### ASMODEE
We illustrate ASMODEE using publicly available NHS pathways data recording
self-reporting of potential COVID-19 cases in England
(see `?nhs_pathways_covid19` for more information).
```{r asmodee}
library(trendbreaker) # for ASMODEE
library(dplyr) # for data manipulation
library(future)
plan("multisession")
# load data
data(nhs_pathways_covid19)
# select last 6 weeks of data
first_date <- max(nhs_pathways_covid19$date, na.rm = TRUE) - 6*7
pathways_recent <- nhs_pathways_covid19 %>%
filter(date >= first_date)
# define candidate models
models <- list(
regression = lm_model(count ~ day),
poisson_constant = glm_model(count ~ 1, family = "poisson"),
negbin_time = glm_nb_model(count ~ day),
negbin_time_weekday = glm_nb_model(count ~ day + weekday)
)
# analyses on all data
counts_overall <- pathways_recent %>%
group_by(date, day, weekday) %>%
summarise(count = sum(count))
# results with fixed 'k' = 7
res <- asmodee(
counts_overall,
models,
k = 7,
date_index = "date",
method = evaluate_aic,
simulate_pi = TRUE
)
res
plot(res, "date")
```
ASMODEE would typically be more useful to investigate shifts in temporal trends
from a large number of time series (e.g. at a fine geographic scale). To make
this sort of analysis easier *trendbreaker* also works with
[*incidence2*](https://github.com/reconhub/incidence2/) objects. To illustrate
this we can consider trends over NHS regions.
```{r, incidence2, message=FALSE}
library(incidence2) # for `incidence()` objects
# select last 6 weeks of data
first_date <- max(nhs_pathways_covid19$date, na.rm = TRUE) - 6*7
pathways_recent <- filter(nhs_pathways_covid19, date >= first_date)
# create incidence object with extra variables
lookup <- select(pathways_recent, date, day, weekday) %>% distinct()
dat <-
pathways_recent %>%
incidence(date_index = date, groups = nhs_region, count = count) %>%
left_join(lookup, by = c("date_index" = "date"))
# define candidate models
models <- list(
regression = lm_model(count ~ day),
poisson_constant = glm_model(count ~ 1, family = "poisson"),
negbin_time = glm_nb_model(count ~ day),
negbin_time_weekday = glm_nb_model(count ~ day + weekday)
)
# analyses on all data
res <- asmodee(dat, models, method = evaluate_aic, k = 7)
plot(res)
```