Issue 40: Add example test data (#41)

* Add example test data generation in data-raw and as .rda * Make the test data internal * Delete PNAS data (assume copied here from another package) * Revert back to non-internal data * Add data documentation * Move simulated data generation from data-raw to tests/testthat * Add PNAS ebola data in raw xlsx format with script for saving it to rda in data * Remove simulated data from data/ * Clean the ebola column names * Document ebola data * Add readxl and janitor to Suggests * Add newline to end of file * Add ask to cite and spacing change * Change name to "ebola_outbreak_sierra_leone" and document roxygen2 * Fix linter issues * Rename to sierra_leone_ebola_outbreak_data * Use shorter ebola data name, and change code style Former-commit-id: d140527 Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a
epinowcast · May 16, 2024 · 1cbaf82 · 1cbaf82
1 parent 920a1f2
commit 1cbaf82
Show file tree

Hide file tree

Showing 9 changed files with 80 additions and 8,360 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -32,7 +32,9 @@ Imports:
 Suggests:
     bookdown,
     epinowcast,
-    testthat (>= 3.0.0)
+    testthat (>= 3.0.0),
+    readxl,
+    janitor
 Remotes:
     stan-dev/cmdstanr,
     Rdatatable/data.table,
@@ -42,3 +44,6 @@ Config/Needs/website:
     epinowcast/enwtheme
 Config/testthat/edition: 3
 URL: https://epidist.epinowcast.org/
+Depends: 
+    R (>= 2.10)
+LazyData: true
diff --git a/R/data.R b/R/data.R
@@ -0,0 +1,19 @@
+#' Ebola linelist data from Fang et al. (2016)
+#'
+#' Linelist data for the Ebola virus collected in Sierra Leone. If you use this
+#' data in your work, please cite the corresponding paper.
+#'
+#' @format A `tibble` with 8,358 rows and 8 columns:
+#' \describe{
+#'   \item{id}{Unique identification number for the case}
+#'   \item{name}{Name as character, omitted}
+#'   \item{age}{Age as numeric}
+#'   \item{sex}{Sex as character, either "F", "M" or NA}
+#'   \item{date_of_symptom_onset}{The date symptoms began}
+#'   \item{date_of_sample_tested}{The date the sample was tested}
+#'   \item{district}{The district (ADM2)}
+#'   \item{chiefdom}{The chiefdom (ADM3)}
+#' }
+#' @family data
+#' @source <https://www.pnas.org/doi/full/10.1073/pnas.1518587113>
+"sierra_leone_ebola_data"
diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -36,3 +36,7 @@ reference:
   desc: Functions and helper functions for plotting
   contents:
   - has_concept("plot")
+- title: Data
+  desc: Data included with the package
+  contents:
+  - has_concept("data")
diff --git a/data-raw/pnas.1518587113.sd02.csv b/data-raw/pnas.1518587113.sd02.csv
diff --git a/data-raw/pnas.1518587113.sd02.xlsx b/data-raw/pnas.1518587113.sd02.xlsx
diff --git a/data-raw/process_raw_data.R b/data-raw/process_raw_data.R
@@ -0,0 +1,5 @@
+sierra_leone_ebola_data <-
+  readxl::read_xlsx("data-raw/pnas.1518587113.sd02.xlsx") |>
+  janitor::clean_names()
+
+usethis::use_data(sierra_leone_ebola_data, overwrite = TRUE)
diff --git a/data/sierra_leone_ebola_data.rda b/data/sierra_leone_ebola_data.rda
diff --git a/man/ebola_outbreak_sierra_leone.Rd b/man/ebola_outbreak_sierra_leone.Rd
diff --git a/tests/testthat/setup.R b/tests/testthat/setup.R
@@ -0,0 +1,15 @@
+set.seed(101)
+
+meanlog <- 1.8
+sdlog <- 0.5
+obs_time <- 25
+sample_size <- 200
+
+sim_obs <- simulate_gillespie() |>
+  simulate_secondary(
+    meanlog = meanlog,
+    sdlog = sdlog
+  ) |>
+  observe_process() |>
+  filter_obs_by_obs_time(obs_time = obs_time) %>%
+  .[sample(seq_len(.N), sample_size, replace = FALSE)]