-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4c2e516
commit fa0c9d9
Showing
21 changed files
with
1,480 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+70.1 KB
docs/articles/2_nowcasting_importance_files/figure-html/after_one_year-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+52 KB
docs/articles/2_nowcasting_importance_files/figure-html/no_age_data-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+63.3 KB
docs/articles/2_nowcasting_importance_files/figure-html/no_age_plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+72.3 KB
docs/articles/3_forecasting_files/figure-html/after_one_year_forecasting-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<html> | ||
<head> | ||
<meta http-equiv="refresh" content="0;URL=https://covid19br.github.io/nowcaster/articles/1_structured_data.html" /> | ||
<meta name="robots" content="noindex"> | ||
<link rel="canonical" href="https://covid19br.github.io/nowcaster/articles/1_structured_data.html"> | ||
</head> | ||
</html> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<html> | ||
<head> | ||
<meta http-equiv="refresh" content="0;URL=https://covid19br.github.io/nowcaster/articles/2_nowcasting_importance.html" /> | ||
<meta name="robots" content="noindex"> | ||
<link rel="canonical" href="https://covid19br.github.io/nowcaster/articles/2_nowcasting_importance.html"> | ||
</head> | ||
</html> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<html> | ||
<head> | ||
<meta http-equiv="refresh" content="0;URL=https://covid19br.github.io/nowcaster/articles/3_forecasting.html" /> | ||
<meta name="robots" content="noindex"> | ||
<link rel="canonical" href="https://covid19br.github.io/nowcaster/articles/3_forecasting.html"> | ||
</head> | ||
</html> | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
--- | ||
title: "Structured data" | ||
output: rmarkdown::html_vignette | ||
author: "Rafael Lopes & Leonardo Bastos" | ||
vignette: > | ||
%\VignetteIndexEntry{Structured data} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r echo=FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>", | ||
warning = F, | ||
message = F, | ||
echo = T | ||
) | ||
``` | ||
|
||
As in the Get Started we start by loading the package and its lazy data, by: | ||
|
||
```{r data-bh} | ||
library(nowcaster) | ||
# Loading Belo Horizonte SARI dataset | ||
data(sragBH) | ||
``` | ||
|
||
## Non-structured data | ||
|
||
Th Get Started example it is a non-structured data estimation, here we give a more detailed on the description of this type of data and how it can change the nowcasting estimation. | ||
|
||
Now we call the nowcasting function, it has by default the parametrization to take the data and estimate with a non-structured data form. The estimate fits a negative binomial distribution, $NegBinom(\lambda_{t,d}, \phi)$, to the cases count at time $t$ with delay $d$, $\phi$ is the dispersion parameter. The rate $\lambda_{t,d}$ is then parametric in a log-linear format by a constant term added by structured delay random effects and structured time random effects. Hence, the model is given by the following: | ||
|
||
$$\begin{equation} | ||
Y_{t,d} \sim NegBinom(\lambda_{t,d}, \phi), \\ | ||
\log(\lambda_{t,d}) = \alpha + \beta_t + \gamma_d, \\ | ||
t=1,2,\ldots,T, \\ d=1,2,\ldots,D, | ||
\end{equation}$$ | ||
|
||
where the intercept $\alpha$ follows as a Gaussian distribution with a very large variance, $\beta_t$ follows a second order random walk with precision $\tau_\beta$, while $\gamma_d$ a first-order random walk with precision $\tau_\gamma$. The model is then completed by INLA default prior distributions for $\phi$, $\tau_\beta$, and $\tau_\gamma$. See [`nbinomial`](https://inla.r-inla-download.org/r-inla.org/doc/likelihood/likelihood-example.pdf), [`rw1`](https://inla.r-inla-download.org/r-inla.org/doc/latent/rw1.pdf) and [`rw2`](https://inla.r-inla-download.org/r-inla.org/doc/latent/rw1.pdf) INLA help pages. | ||
|
||
The call of the function is straightforward, it simply needs a dataset as input, here the `LazyData` loaded in the namespace of the package. The function has 3 mandatory parameters, `dataset` to parse the dataset to be nowcasted, `date_onset` for parsing the column name which is the date of onset of symptoms and `date_report` which parses the column name for the date of report of the cases. Here this columns are "DT_SIN_PRI" and "DT_DIGITA", respectively. | ||
|
||
```{r no_age} | ||
nowcasting_bh_no_age <- nowcasting_inla(dataset = sragBH, | ||
date_onset = DT_SIN_PRI, | ||
date_report = DT_DIGITA) | ||
head(nowcasting_bh_no_age$total) | ||
``` | ||
|
||
The above calling will return only the nowcasting estimate and its Confidence Interval (CI) for two different credibility levels, `LIb` and `LSb` are the max and min CI, respectively, with credibility of 50% and `LI` and `LS` are the max and min CI, respectively, with credibility of 95%. | ||
|
||
The `nowcasting_inla` has the option to return the curve on which the window of action of the model was set, if the `data.by.week` parameter is flagged as `TRUE` it returns on the second element on the output list, the summarized data by week. | ||
|
||
```{r no_age_data} | ||
nowcasting_bh_no_age <- nowcasting_inla(dataset = sragBH, | ||
date_onset = DT_SIN_PRI, | ||
date_report = DT_DIGITA, | ||
data.by.week = T) | ||
head(nowcasting_bh_no_age$data) | ||
``` | ||
|
||
This element is the counts of cases by each delay days. It is known as the delay triangle, if we table the delay amount against the data of onset of first symptoms, we can see how is the pattern of the delay for the cases. | ||
|
||
```{r delay-triangle} | ||
library(dplyr) | ||
data_triangle <- nowcasting_bh_no_age$data |> | ||
filter(delay < 30) |> | ||
arrange(delay) %>% select(-Time) | ||
# delay_triangle<-table(data_triangle$dt_event, | ||
# data_triangle$delay, | ||
# dnn = list("Date of Onset", "Delay")) | ||
data_triangle %>% | ||
tidyr::spread(key = delay, value = Y) | ||
``` | ||
|
||
We just look at the amount of cases with 30 weeks of delay or less, it is the default maximum delay considered at nowcasting estimation. | ||
|
||
If this element is groped by and summarized by the onset of symptoms date, here `DT_SIN_PRI`, it is the epidemiological curve observed. To example it, we plot the estimate and the epidemiological curve all together. | ||
|
||
```{r no_age_plot} | ||
library(ggplot2) | ||
data_by_week <- nowcasting_bh_no_age$data |> | ||
dplyr::group_by(dt_event) |> | ||
dplyr::reframe( | ||
observed = sum(Y, na.rm = T) | ||
) |> | ||
dplyr::filter(dt_event >= max(dt_event)-270) | ||
nowcasting_bh_no_age$total |> | ||
filter(dt_event >= (max(dt_event)-270)) |> | ||
ggplot(aes(x = dt_event, y = Median, col = 'Nowcasting')) + | ||
geom_line(data = data_by_week, | ||
aes(x = dt_event, y = observed, col = 'Observed'))+ | ||
geom_ribbon(aes(ymin = LI, ymax = LS, col = NA), alpha = 0.2, show.legend = F)+ | ||
geom_line()+ | ||
theme_bw()+ | ||
theme(legend.position = "bottom", axis.text.x = element_text(angle = 90)) + | ||
scale_color_manual(values = c('grey50', 'black'), name = '')+ | ||
scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+ | ||
labs(x = '', y = 'Nº Cases') | ||
``` | ||
|
||
|
||
## Structured data, Age | ||
|
||
For the structured data the `nowcasting_inla()` fits again a Negative binomial distribution to the cases count at time $t$ with delay $d$. Differently, from the non-structured case the model now gives random effects to the delay distribution and and time distribution by each of the age-class chosen by the user to break the data. The model has the form now: | ||
|
||
$$\begin{equation}Y_{t,d,a} \sim NegBinom(\lambda_{t,d,a}, \phi), \\ | ||
\log(\lambda_{t,d,a}) = \alpha_a + \beta_{t,a} + \gamma_{d,a}, \\ \quad t=1,2,\ldots,T, \\ d=1,2,\ldots,D, \\ a=1,2,\ldots,A, \end{equation}$$ | ||
|
||
where each age class, $a$, has an intercept $\alpha_a$ following a Gaussian distribution with a very large variance, the time-age random effects, $\beta_{t,a}$, follow a joint multivariate Gaussian distribution with a separable variance components an independent Gaussian term for the age classes with precision $\tau_{age,\beta}$ and a second order random walk term for the time with precision $\tau_{\beta}$. Analogously, the delay-age random effects, $\gamma_{d,a}$, follow a joint multivariate Gaussian distribution with a separable variance components an independent Gaussian term for the age classes with precision $\tau_{age,\gamma}$ and a first order random walk term for the time with precision $\tau_{\gamma}$. The model is then completed by INLA default prior distributions for $\phi$, $\tau_{age,\beta}$, $\tau_{age,\gamma}$, $\tau_{\beta}$ and $\tau_\gamma$. See nbinom, iid, rw1 and rw2 INLA help pages. | ||
|
||
This new model corrects the delay taking into account the effects of age classes and the interactions of each age class between time and also delay. Now the model needs a flag indicating which is the column on the dataset which will be used to break the data into age classes and how the age classes will be split. This is given by the parameters `age_col` and `bins_age`. We pass three additional parameters, `data.by.week` to return the epidemiological curve out of window of action of nowcasting estimate and `return.age` to inform we desire a nowcasting result in two ways, the total aggregation estimate and the age-stratified estimate. The calling of the function has the following form: | ||
|
||
```{r nowcasting} | ||
nowcasting_bh_age <- nowcasting_inla(dataset = sragBH, | ||
bins_age = "10 years", | ||
data.by.week = T, | ||
date_onset = DT_SIN_PRI, | ||
date_report = DT_DIGITA, | ||
age_col = Idade) | ||
``` | ||
|
||
Each of the estimates returned by `nowcasting_inla` has the same form as in the non-structured case. On the nowcasting estimates, it returns a data.frame with the posterior edian and 50% and 95% credible intervals, (LIb and LSb) and (LI and LS) respectively. | ||
|
||
```{r plot} | ||
library(ggplot2) | ||
dados_by_week <- nowcasting_bh_age$data |> | ||
dplyr::group_by(dt_event) |> | ||
dplyr::reframe( | ||
observed = sum(Y, na.rm = T) | ||
) |> | ||
dplyr::filter(dt_event >= max(dt_event)-270) | ||
nowcasting_bh_age$total |> | ||
ggplot()+ | ||
geom_line(aes(x = dt_event, y = Median, col = 'Nowcasting'))+ | ||
geom_line(data = dados_by_week, | ||
aes(x = dt_event, y = observed, col = "Observed"))+ | ||
geom_ribbon(aes(x = dt_event, y = Median, | ||
ymin = LI, ymax = LS), alpha = 0.2, show.legend = F)+ | ||
theme_bw()+ | ||
theme(legend.position = "bottom", axis.text.x = element_text(angle = 90))+ | ||
scale_color_manual(values = c('grey50', 'black'), name = '')+ | ||
scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+ | ||
labs(x = '', y = 'Nº Cases') | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
--- | ||
title: "Nowcasting for decision making" | ||
output: rmarkdown::html_vignette | ||
author: "Rafael Lopes & Leonardo Bastos" | ||
vignette: > | ||
%\VignetteIndexEntry{Nowcasting for decision making} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r echo=FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>", | ||
warning = F, | ||
message = F, | ||
echo = T | ||
) | ||
``` | ||
|
||
## Nowcasting as a tool to support decision making | ||
|
||
Nowcasting a rising curve or a curve on any other moment can give quantitative support for decision making, during the public health crises, the most needed is a way to anticipate, at least, what it is happening at the moment. Nowcasting is the tool for this type of questioning and can gives insights on the data to support needed decisions. | ||
|
||
We start this section by cutting the original data at a moment of apparent decaying of the SARI hospitalisation, for the city of Belo Horizonte, which had a prompt starting response to the Covid-19 pandemic. The pressure on the health system took more time than the rest of the country, and the data at same time were showing a decay. We filter all cases entered until 4th of July of 2020 by the date of digitization, a date that the cases shows up in the database. | ||
|
||
```{r no_age_data} | ||
library(tidyverse) | ||
library(lubridate) | ||
library(nowcaster) | ||
## To see Nowcasting as if we were on the verge of rise in the curve | ||
data("sragBH") | ||
srag_now<-sragBH |> | ||
filter(DT_DIGITA <= "2020-07-04") | ||
data_by_week <- data.w_no_age(dataset = srag_now, | ||
date_onset = DT_SIN_PRI, | ||
date_report = DT_DIGITA) |> | ||
group_by(date_onset) |> | ||
tally() | ||
data_by_week |> | ||
ggplot(aes(x = date_onset, | ||
y = n))+ | ||
geom_line()+ | ||
theme_bw()+ | ||
labs(x = 'Date of onset of symptons', | ||
y = 'Nº Cases')+ | ||
scale_color_manual(values = c('grey50', 'black'), | ||
name = '')+ | ||
scale_x_date(date_breaks = '2 weeks', | ||
date_labels = '%V/%y', | ||
name = 'Date in Weeks') | ||
``` | ||
|
||
On this filtered data, we estimate the cases already that started its date of onset of symptoms but were not yet reported, so there not in the database. We just pass to the `nowcasting_inla` function, the dataset filtered, flag for the columns where are the `date_onset` and `date_report`, we add the flag for the function return back the epidemic curve by epiweek. | ||
|
||
```{r now_no_age_data} | ||
nowcasting_bh_no_age <- nowcasting_inla(dataset = srag_now, | ||
date_onset = DT_SIN_PRI, | ||
date_report = DT_DIGITA, | ||
data.by.week = T) | ||
head(nowcasting_bh_no_age$data) | ||
``` | ||
|
||
Before we see the result of the nowcasting estimate we take a look on intermediate part of the process of nowcasting, the delay triangle, which sets the objects for nowcasting. The delay triangle is only a table where each unique amount of delay, (i.e. integer numbers of days or weeks) has passed between date of onset and date of report spread over each date of onset. The part that is closer to the present has less counts and has a lower amount of delay, this trivial due to, as the system takes time to process the cases, the most newer cases are lesser than the older ones, that already time to be processed. | ||
|
||
From the data in weekly format we mount the counts of cases by the amount of delay. By tabling the delay amount against the data of onset of first symptoms, to see the pattern of the delay for the cases. | ||
|
||
```{r delay-triangle} | ||
data_triangle <- nowcasting_bh_no_age$data |> | ||
filter(delay < 30) |> | ||
arrange(delay) %>% select(-Time) | ||
# delay_triangle<-table(data_triangle$dt_event, | ||
# data_triangle$delay, | ||
# dnn = list("Date of Onset", "Delay")) | ||
data_triangle %>% | ||
tidyr::spread(key = delay, value = Y) | ||
``` | ||
|
||
We just look at the amount of cases with 30 weeks of delay or less, it is the default maximum delay considered at nowcasting estimation. It can be changed by the parameter `Dmax`. | ||
|
||
If this element is grouped by and summarized by the onset of symptoms date, here `DT_SIN_PRI`, it is the epidemiological curve observed. To example it, we plot the estimate and the epidemiological curve all together. | ||
|
||
```{r no_age_plot} | ||
library(ggplot2) | ||
dados_by_week <- nowcasting_bh_no_age$data |> | ||
dplyr::group_by(dt_event) |> | ||
dplyr::reframe( | ||
observed = sum(Y, na.rm = T) | ||
) | ||
nowcasting_bh_no_age$total |> | ||
ggplot(aes(x = dt_event, y = Median, | ||
col = 'Nowcasting')) + | ||
geom_line(data = dados_by_week, | ||
aes(x = dt_event, y = observed, | ||
col = 'Observed'))+ | ||
geom_ribbon(aes(ymin = LI, ymax = LS, col = NA), | ||
alpha = 0.2, | ||
show.legend = F)+ | ||
geom_line()+ | ||
theme_bw()+ | ||
theme(legend.position = "bottom", | ||
axis.text.x = element_text(angle = 90)) + | ||
scale_color_manual(values = c('grey50', 'black'), | ||
name = '')+ | ||
scale_x_date(date_breaks = '2 weeks', | ||
date_labels = '%V/%y', | ||
name = 'Date in Weeks')+ | ||
labs(x = '', | ||
y = 'Nº Cases') | ||
``` | ||
|
||
And as expected, the nowcasting estimated a rising on curve when it were observed a decaying. Adding to the plot what actually has happened in that period, with the data inserted posteriorly the period for when the nowcasting estimated the rising in the curve for SARI hospitalizations. | ||
|
||
```{r after_one_year} | ||
nowcasting_bh_no_age$total %>% | ||
ggplot(aes(x = dt_event, y = Median, col = 'Nowcasting')) + | ||
geom_line(data = dados_by_week, | ||
aes(x = dt_event, y = observed, col = 'Observed'))+ | ||
geom_ribbon(aes(ymin = LI, ymax = LS, col = NA), | ||
alpha = 0.2, | ||
show.legend = F)+ | ||
geom_line()+ | ||
geom_line( data = sragBH %>% | ||
filter(DT_SIN_PRI <= "2020-07-04") %>% | ||
mutate( | ||
D_SIN_PRI_2 = DT_SIN_PRI - as.numeric(format(DT_SIN_PRI, "%w")) | ||
) %>% | ||
group_by(D_SIN_PRI_2) %>% | ||
tally(), | ||
mapping = aes(x = D_SIN_PRI_2, y = n, | ||
color = "Observed after one year")) + | ||
theme_bw() + | ||
theme(legend.position = "bottom", | ||
axis.text.x = element_text(angle = 90)) + | ||
scale_color_manual(values = c('grey50', 'black', 'red'), | ||
name = '')+ | ||
scale_x_date(date_breaks = '2 weeks', | ||
date_labels = '%V/%y', | ||
name = 'Date in Weeks')+ | ||
labs(x = '', | ||
y = 'Nº Cases') | ||
``` | ||
|
||
This end the first simple example when estimating the already started events but not yet reported (i.e. nowcasting). The relevance of nowcasting for public health decision is given by the understanding that what is present on the databases are only a picture of the real time situation. The above graph can help policy makers on what decisions they can take in face of a rising curve of cases, hospitalisations or deaths. |
Oops, something went wrong.