-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR to add individual-level data cfr case study for ebola-like disease #1
base: main
Are you sure you want to change the base?
Conversation
…or an ebola-like scenario
…nction as for the other cutoffs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @CarmenTamayo. I've left a few comments through the script and on some functions. Happy to discuss any comments that aren't clear. This is just a first run through the code. I'll take a look again soon and provide more detailed comments.
Some additional notes, if you're going to write custom functions for this analysis it would be good to add {roxygen2} documentation blocks and unit tests. I'd be happy to contribute to these.
) | ||
|
||
ebola_ip_meansd <- c(mean = 9.4, sd = 5.5) | ||
# From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560911/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in {epireview}, if not would be good to add to the {epiparameter} database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not in {epireview}, I found it online and it's not parameterised, is it worth adding to the epiparameter db? if so, it'd be helpful if you could run me through the process to make sure I do it properly
|
||
# Function rum_ebola_simulations uses simulist to create a list with n=100 (for now) linelists | ||
set.seed(487593478) | ||
simulations_ebola <- run_ebola_simulations(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the run_ebola_simulations()
function can be removed and replaced with an lapply()
, which is good as it is less code to maintain as you run the analyses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean doing something like this:
simulations_ebola <- lapply(1:100, function(x){ sim_linelist( contact_distribution = contact_distribution, infectious_period = ip_ebola_epidist, onset_to_hosp = NULL, hosp_risk = NULL, hosp_death_risk = NULL, onset_to_death = o_d_ebola_epidist, onset_to_recovery = o_r_ebola_epidist, prob_infect = 0.7, outbreak_size = c(500, 2000), non_hosp_death_risk = 0.3 ) })
I've tried doing lapply(1:100, sim_linelist, arguments)
but it wouldn't let me, should this work?
for (old_name in names(rename_vec)) { | ||
if (old_name %in% colnames(incidence_data)) { | ||
colnames(incidence_data)[colnames(incidence_data) == old_name] <- rename_vec[[old_name]] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this can be made much more efficient by vectorising the renaming. It will also require much less code than currently written. Happy to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to change it but I'm not 100% sure of what you mean, we can discuss it or you can suggest the changes and I'll accept them
# Incidence object from linelist has 3 count_variables, but we want the column | ||
# date_onset only to be considered to estimate the peak | ||
id_outbreak_peak <- function(linelist) { | ||
inc_data <- linelist %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function remakes an <incidence>
object, but one is already created in the analysis script before calling this function. Might be worth seeing the <incidence>
already made can be passed into this function and save yourself a step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant the other day about having problems with this step, the thing is that in order to be able to use incidence2's function to estimate the outbreak peak, you need to start off an incidence object, and then it takes the date_index column to give you the peak. The problem is that when there are more than 1 elements in date_index, it calculates the peak for each of them, not just for the onset date (in my case, also for the deaths and recoveries). I tried starting off the previous incidence object that I use in the analyses and remove the rows in date_index that said "date_death" and "date_recovery" first so it would only take into account onset dates, but I didn't find it easy to do at the time and I was getting some unexpected results, so I decided it was best to start off the "correct" incidence object
If there is an easier way to do this I'm more than happy to change it
@@ -0,0 +1,8 @@ | |||
# Function to calculate cfr by specifying columns from a dataframe with | |||
# deaths and cases | |||
estimate_cfr <- function(df, numerator, denominator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the {cfr} package not calculate this when a delay distribution is not supplied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it can, initially I wasn't using it for 2 reasons, 1/ because the output includes the 95 CI and I just wanted the CFR value, so I thought it would be easier to write a function that is doing exactly what I want it to do, 2/ because I wanted the package to be used just for the comparison of the methods, but I guess I could still use it anyway if a delay distribution is not being used... do you think it would be better?
library(data.table) | ||
library(zoo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are {data.table} and {zoo} used? It's personal preference, but you could namespace some functions (e.g. pkgname::func()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they were used previously but not in the final version, I'll remove them
and do you mean whenever I use a package to include the package name?
Steps included
• Start- first date with > 5 cases
• Peak- using vector from id_outbreak_peak
• Growing- date between the start and the peak with sufficient cases and deaths for cfr calculation
• End-max date
a. Real time outbreak during growing phase (1) and during declining phase (2): using function subset_outbreak_phase, which takes a random date between start and peak for 1, and between peak and end for 2, and uses this date as a cut-off point for the real-time analysis.
b. Real-time outbreak at the peak (3): using a for loop to cut the df at the row that says "peak"
c. (4) For the retrospective-cohort approach, the real-time cut-off is between the "growing" date and the peak date, to ensure there will be enough cases so that the cfr calculation makes sense (i.e., no 0 deaths/cases or deaths/0 cases)