-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new package infrastructure (epichains
classes and methods)
#33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! A few comments:
- I'm not a fan of the mixture of
vec
/vect
and think we could do with something more explicit here. Perhaps_summaries
? estimate_likelihood
doesn't always produce an estimate. If an analytical solution is available then the results is exact. Perhaps this should be justlikelihood
? Would have to rename the file anme too then and distinguish fromlikelihoods.R
(which should perhapsdist_likelihoods.R
or something like that?).- this is probably going too far but since you're redisgning already anyway you could consider whether you want to implement this in the line of the more generic modelling functions/packages in R, e.g. have
epichains
(or similar) objects that contain model parameters so one could calllogLik(epichain)
or perhaps evenpredict(epichain)
(instead ofsimulate
). Again, I'm not at all convinced this is a good idea but I think it might be worth at least thinking about.
CITATION.cff
Outdated
keywords: | ||
- branching-process | ||
- epidemic-dynamics | ||
- epidemic-modelling | ||
- epidemic-simulations | ||
- outbreak-simulator | ||
- r | ||
- r-package | ||
- transmission-chain | ||
- transmission-chain-reconstruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to remove these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tags were probably automatically removed when the package was forked. I'll re-add them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added the tags back on this repo and have also set up the CITATION.cff
syncing workflow here (54a29ff), so that should resolve this.
R/epichains.R
Outdated
) | ||
|
||
# print head of the simulation output | ||
print(head(x[!is.na(x$ancestor), ])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to sort by ancestor first? Or do we assume users don't mess with the order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I actually wonder if it won't be best to sort by sim_id
then ancestor
, so that it's easier to see the chains within each simulation. I have implemented that here 6972ed7.
R/likelihood_estimation.R
Outdated
#' @param chains_observed Vector of sizes/lengths of transmission chains. | ||
#' @param chain_statistic Statistic given as \code{chains_observed} | ||
#' ("size" or "length" of chains). | ||
#' @param offspring_sampler Offspring distribution: a character string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're not providing the sampler here but the distribution (from which the sampler is obtained internally), so would it make sense to call this offspring_dist
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've made the change.
#' simulate_tree_from_pop(pop = 100, offspring_sampler = "nbinom", | ||
#' mean_offspring = 0.5, disp_offspring = 1.1, serial_sampler = function(x) 3) | ||
#' @export | ||
simulate_tree_from_pop <- function(pop, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to eventially merge this into simulate_tree
with a pop
option? I think that might make sense if it doesn't make that function too complex.
R/epichains.R
Outdated
if (attributes(x)$chain_type == "chains_tree") { | ||
stopifnot( | ||
"object does not contain the correct columns" = | ||
c("sim_id", "ancestor", "generation", "time") %in% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"time" is not strictly necessary - only if used with a serial interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 5d3e108.
#' @param chain_stat_max A cut off for the chain statistic (size/length) being | ||
#' computed. Results above the specified value, are set to this value. | ||
#' Defaults to `Inf`. | ||
#' @param serials_sampler The serial interval generator function; the name of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps rename it something that contains "interval", e.g. si_sampler
(though see also my other comment on supporting character strings vs. anonymous functions here - I think there might be an argument for just supporting character strings in which case this would be si_dist
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the result of the discussion (including #25 (comment)) is that we want to do this via character strings everywhere instead of functions (though happy to be told otherwise if you read this differently). Perhaps convert my previous comment to an issue for future processing.
R/epichains.R
Outdated
) | ||
|
||
# Offer more information to view the full dataset | ||
writeLines(sprintf("Use View(<object_name>) to view the full output.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
View
doesn't work well in remote sessions - Perhaps instead have
writeLines(sprintf("Use View(<object_name>) to view the full output.")) | |
writeLines(sprintf("Use as.data.frame(<object_name>) to view the full output.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the change here c0c6197.
Definitely worth thinking about, imo. Gut intuition is that doing some light work would get the basic interface requirements met. Would take a fair bit to get everything properly aligned so that it "just works" with the tools written against the generic interface, but that would be potentially worthwhile. Re tagging for #25 - happy to contribute code to making the-lambda-licious version actually happen, if that's the direction people want. |
I'm not convinced either way but currently leaning towards using character strings to identify
Of course the real issue is that the R distribution interface is not particularly clever or sensible - we could consider using e.g. the more sensible distr6 which would remove the issue altogether but at the cost of some overhead. |
Why would |
because it works with character strings specifying distributions? |
But itself is a function that takes arguments, yes? So has the same contours as passing any other distribution function + arguments, right? As in:
|
It's a package that provides distribution-based functions (i.e. |
See update with example; there's no need to reimplement I don't think you'll want to use the |
Ah but the point is that the truncation could come in separately if the user wants control measures or depletion of susecptibles. Of course the same could be implemented by the user in just providing a truncated distribution but having these separate would provide convenience arguments. |
Hmm - where does So looking at the apparent source (https://github.com/cran/truncdist/blob/master/R/qtrunc.R), it is using Something like this should be able to be made to work:
|
Yes, I think that would work, and be a workaround to address point (2). |
R/borel.r
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention for documenting the d/r/q/p functions is to use a single rdname.
This is also waaaaaaaaay thin in terms of detail for the distribution, parameters, connection to typical measures (mean, sd, ...), etc.
Probably warrants a devoted issue, rather than tacking on to this already ... dense PR.
Conversation from Slack: @pearsonca I think the upshot from the discussion is that there is likely a way to support the distributions arguments as functions (and probably also backwards-compatible as strings - sniff w/ I'm happy to put my keystrokes where my mouth is and help with sorting out how to make that happen @sbfnk @pearsonca @sbfnk @pearsonca probably even ( @sbfnk @pearsonca @sbfnk @pearsonca @sbfnk
I do! |
Codecov Report
@@ Coverage Diff @@
## main #33 +/- ##
=======================================
Coverage ? 43.18%
=======================================
Files ? 8
Lines ? 396
Branches ? 0
=======================================
Hits ? 171
Misses ? 225
Partials ? 0 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Co-authored-by: Sebastian Funk <[email protected]>
edcd9bb
to
7e9665a
Compare
This PR addresses several issues:
bpmodels::chain_ll()
tolikelihood()
to close refactor chain_ll() #5 and fix Refactor chain_sim() #13. Tests have been delayed to accommodate any changes in name and structure resulting from code review.bpmodels::chain_sim_susc()
by splitting it intosimulate_tree()
andsimulate_summary()
to close refactor chain_sim_susc() #14. Tests have been delayed to accommodate any changes in name and structure resulting from code review.chain_sim_susc()
tosimulate_tree_from_pop()
. This closes refactor chain_sim_susc() #14.chain_ll()
tolikelihood()
. This closes Rename current functions #26.epichains
class to partly fix Consider S3 output class forchain_ll()
andchain_sim()
#4.likelihood()
, which is a refactoring ofchain_ll()
currently does not implement S3 because its use case is currently unknown.Essentially, merging this PR will introduce:
simulate_tree()
- simulates branching processes from a given number of chains.simulate_tree_from_pop()
- simulates branching processes from a given population size and pre-existing immunity.simulate_summary()
- simulates a vector of cluster sizes or lengths from branching processeslikelihood()
- calculates the likelihood/loglihood of observing transmission chains of given sizes or lengths.epichains
class that storesdata.frame
-like output fromsimulate_tree()
andsimulate_tree_from_pop()
with achains_tree
attribute andvector
-like model output fromsimulate_summary()
with achains_summary
attribute.print()
,summary()
, andaggregate()
methods. Note that theaggregate()
method only works forepichains
objects with thechains_tree
attribute.