PR to add individual-level data cfr case study for ebola-like disease #1

CarmenTamayo · 2024-06-19T15:43:20Z

Steps included

Define parameters for simulist, that include a realistic onset-death delay and a longer onset-recovery
Function run_ebola_simulations that uses simulist to create a list with n=100 (for now) linelists
Function tidy_sim_results, that creates 2 new columns (date_onset and date_recovery) and makes sure they're on a date format
Function convert_to_incidence to use incidence2 and have a list of df with daily aggregated data, pivots the tables so they can be used with cfr, and makes sure the names of the columns are correct
Function id_outbreak_peak to create a vector that contains all peak dates for every df using incidence2::estimate_peak
Function id_outbreak_phases_new- 4 key outbreak dates
• Start- first date with > 5 cases
• Peak- using vector from id_outbreak_peak
• Growing- date between the start and the peak with sufficient cases and deaths for cfr calculation
• End-max date
4 different lists of dfs are created
a. Real time outbreak during growing phase (1) and during declining phase (2): using function subset_outbreak_phase, which takes a random date between start and peak for 1, and between peak and end for 2, and uses this date as a cut-off point for the real-time analysis.
b. Real-time outbreak at the peak (3): using a for loop to cut the df at the row that says "peak"
c. (4) For the retrospective-cohort approach, the real-time cut-off is between the "growing" date and the peak date, to ensure there will be enough cases so that the cfr calculation makes sense (i.e., no 0 deaths/cases or deaths/0 cases)
Function estimate_cfr which takes specified columns from data frame to sum(deaths)/sum(cases) and creates a vector with the cfr per dataframe
Function cfr_ci to take cfr numeric vector and estimate mean + 95% CI

…or an ebola-like scenario

…nction as for the other cutoffs

joshwlambert

Nice work @CarmenTamayo. I've left a few comments through the script and on some functions. Happy to discuss any comments that aren't clear. This is just a first run through the code. I'll take a look again soon and provide more detailed comments.

Some additional notes, if you're going to write custom functions for this analysis it would be good to add {roxygen2} documentation blocks and unit tests. I'd be happy to contribute to these.

analyses/Linelist_cs_ebola_simulations.R

joshwlambert · 2024-06-26T12:58:05Z

analyses/Linelist_cs_ebola_simulations.R

+)
+
+ebola_ip_meansd <- c(mean = 9.4, sd = 5.5)
+# From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560911/


Is this in {epireview}, if not would be good to add to the {epiparameter} database?

It's actually not in {epireview}, I found it online and it's not parameterised, is it worth adding to the epiparameter db? if so, it'd be helpful if you could run me through the process to make sure I do it properly

R/run_ebola_simulations.R

joshwlambert · 2024-06-26T13:02:34Z

analyses/Linelist_cs_ebola_simulations.R

+
+# Function rum_ebola_simulations uses simulist to create a list with n=100 (for now) linelists
+set.seed(487593478)
+simulations_ebola <- run_ebola_simulations(100)


I think the run_ebola_simulations() function can be removed and replaced with an lapply(), which is good as it is less code to maintain as you run the analyses.

Do you mean doing something like this:

simulations_ebola <- lapply(1:100, function(x){ sim_linelist( contact_distribution = contact_distribution, infectious_period = ip_ebola_epidist, onset_to_hosp = NULL, hosp_risk = NULL, hosp_death_risk = NULL, onset_to_death = o_d_ebola_epidist, onset_to_recovery = o_r_ebola_epidist, prob_infect = 0.7, outbreak_size = c(500, 2000), non_hosp_death_risk = 0.3 ) })

I've tried doing lapply(1:100, sim_linelist, arguments) but it wouldn't let me, should this work?

joshwlambert · 2024-06-26T13:13:25Z

R/convert_to_incidence.R

+  for (old_name in names(rename_vec)) {
+    if (old_name %in% colnames(incidence_data)) {
+      colnames(incidence_data)[colnames(incidence_data) == old_name] <- rename_vec[[old_name]]
+    }
+  }


It feels like this can be made much more efficient by vectorising the renaming. It will also require much less code than currently written. Happy to discuss.

I'm happy to change it but I'm not 100% sure of what you mean, we can discuss it or you can suggest the changes and I'll accept them

joshwlambert · 2024-06-26T13:15:18Z

R/id_outbreak_peak.R

+# Incidence object from linelist has 3 count_variables, but we want the column
+# date_onset only to be considered to estimate the peak
+id_outbreak_peak <- function(linelist) {
+  inc_data <- linelist %>%


This function remakes an <incidence> object, but one is already created in the analysis script before calling this function. Might be worth seeing the <incidence> already made can be passed into this function and save yourself a step.

This is what I meant the other day about having problems with this step, the thing is that in order to be able to use incidence2's function to estimate the outbreak peak, you need to start off an incidence object, and then it takes the date_index column to give you the peak. The problem is that when there are more than 1 elements in date_index, it calculates the peak for each of them, not just for the onset date (in my case, also for the deaths and recoveries). I tried starting off the previous incidence object that I use in the analyses and remove the rows in date_index that said "date_death" and "date_recovery" first so it would only take into account onset dates, but I didn't find it easy to do at the time and I was getting some unexpected results, so I decided it was best to start off the "correct" incidence object
If there is an easier way to do this I'm more than happy to change it

joshwlambert · 2024-06-26T13:19:31Z

R/estimate_cfr.R

@@ -0,0 +1,8 @@
+# Function to calculate cfr by specifying columns from a dataframe with
+# deaths and cases
+estimate_cfr <- function(df, numerator, denominator) {


Can the {cfr} package not calculate this when a delay distribution is not supplied?

yes it can, initially I wasn't using it for 2 reasons, 1/ because the output includes the 95 CI and I just wanted the CFR value, so I thought it would be easier to write a function that is doing exactly what I want it to do, 2/ because I wanted the package to be used just for the comparison of the methods, but I guess I could still use it anyway if a delay distribution is not being used... do you think it would be better?

joshwlambert · 2024-06-26T13:21:20Z

analyses/Linelist_cs_ebola_simulations.R

+library(data.table)
+library(zoo)


Where are {data.table} and {zoo} used? It's personal preference, but you could namespace some functions (e.g. pkgname::func()).

they were used previously but not in the final version, I'll remove them
and do you mean whenever I use a package to include the package name?

analyses/Linelist_cs_ebola_simulations.R

CarmenTamayo added 3 commits June 19, 2024 12:25

Script and functions for cfr case study using individual-level data f…

26d233a

…or an ebola-like scenario

Adding comment about including cutoff point at peak using the same fu…

dc13cef

…nction as for the other cutoffs

Updating simulist use and including hosp_risk and hosp_death_risk

c38ebe7

CarmenTamayo requested a review from joshwlambert June 19, 2024 15:43

joshwlambert reviewed Jun 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR to add individual-level data cfr case study for ebola-like disease #1

PR to add individual-level data cfr case study for ebola-like disease #1

CarmenTamayo commented Jun 19, 2024

joshwlambert left a comment

joshwlambert Jun 26, 2024

CarmenTamayo Jul 2, 2024

joshwlambert Jun 26, 2024

CarmenTamayo Jul 2, 2024

joshwlambert Jun 26, 2024

CarmenTamayo Jul 2, 2024

joshwlambert Jun 26, 2024

CarmenTamayo Jul 2, 2024

joshwlambert Jun 26, 2024

CarmenTamayo Jul 2, 2024

joshwlambert Jun 26, 2024

CarmenTamayo Jul 2, 2024

		library(data.table)
		library(zoo)

PR to add individual-level data cfr case study for ebola-like disease #1

Are you sure you want to change the base?

PR to add individual-level data cfr case study for ebola-like disease #1

Conversation

CarmenTamayo commented Jun 19, 2024

joshwlambert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment