-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full package review for epichains v0.1.0 release #122
Conversation
Thanks @jamesmbaazam - will be reviewing this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you work on this! I think it's overall in good shape. I've made some comments inline relating to design, performance, documentation & good practices.
In terms of design & usability, I have to main concerns:
-
Some
...
(e.g., inlikelihood()
) are passed to child functions several levels deep. This makes it very difficult to know which arguments this function is taking and where they are documented.
Thenobs_offspring
argument is a good example of that. It's required if you passed a distribution that epichains doesn't know but it's not really explicit from the function documentation. And as far as I can tell, you cannot arrive to the help page describing it from links inlikelihood()
.
I am not yet completely sure about the solution for this. Maybe make these arguments explicit rather than implicit in the ellipses? -
The fact that we have 3 very similar functions, with similar names, similar scopes and similar arguments feels confusing to me.
From a conceptual point of view,simulate_tree()
feels (is?) the same assimulate_from_pop(pop = Inf)
andsimulate_summary()
feels like a downstream analysis function that should be applied to the output ofsimulate_tree()
.
I understand they are separate for technical reasons. I understand thatsimulate_summary()
exists to circumvent the memory issue that arises from the fact we're working with exponential growth processes. I understandsimulate_from_pop()
andsimulate_tree()
have slightly different implementations.
But it's still probably a problem to let technical considerations inform the design of our user interface. Even if we had to, e.g., internally dispatch tosimulate_from_pop()
orsimulate_tree()
, there may be value in having a single user-facing wrapper function.
Tangentially but on a related note, I don't follow why the output ofsimulate_from_pop()
has fewer columns than the one fromsimulate_tree()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting all this together. Have added some comments mostly on vignettes – in particular, think there are some tweaks we could made to improve entry point for new users and highlight complementary aspects of other packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jamesmbaazam for opening this review. I think the package looks mostly alright. I haven't looked into the tests and the vignettes, and I'm laying out some general points below.
General
- I wonder whether the three
simulate_*()
functions can be rolled into one. I'm not able to tell whysimulate_tree()
andsimulate_tree_from_pop()
are different. It should be possible to have both an initial number of cases, and a finite population size. - The
simulate_summary()
function can probably be removed in favour of asummary.epichains()
method for othersimulate_*()
outputs. This would probably cut down on the code and I don't feel thatsimulate_tree()
is too slow to run. - I am not convinced that this package benefits from S3 classes, as they are very thin wrappers around standard structures. One good reason to have a custom class is for methods such as
summary.epichains()
(see point above), or methods to link it with {igraph}/{tidygraph} or {epicontacts}. This could still be done by returning data.frames in an edgelist format (from, to, edge strength) which can be ingested by e.g.igraph::graph_from_data_frame()
- this opens the door to easy plotting. - The vignette on interventions should be rephrased or removed, as the examples are not modelling reactive interventions, but rather scenarios with different pathogen or public-health parameters. It should be possible to implement an intervention that reduces a model parameter between some timepoints (see
<rate_interventions>
in {epidemics}), but this would require you to keep track of the model time separately from the infection's generation time, and move the code in a more (discrete?) time-based direction.
General technical
- Bring package test coverage up to 100%;
- Update the WORDLIST and enable R CMD check failure on spelling errors;
- Run {styler};
- Ensure that the GH workflows are up to date with {packagetemplate}, I haven't checked if they are;
- Go through the spring cleaning tasklist and implement any fixes needed; see also the {usethis} release checklist: e.g. add examples for all exported functions;
- Suggest numeric vectors should never have infinite values, check all instances of
checkmate::assert_*()
and especially do not let the upper value of numerics beInf
; - Set all optional arguments to
NULL
rather than missing, check instances of this, e.g.R/likelihood.R
; - Mention the type of all function arguments and returns in the documentation. E.g. for
likelihood()
,@param chains A numeric vector representing chain sizes or lengths.
; - Be consistent with the use of
return()
; - Prefer the use of
checkmate::assert_*()
orstopifnot(checkmate::test_*())
rather thanif(!checkmate::test_*()) stop()
; - Avoid comments inside temporary functions generated within other function bodies;
- Prefer
rbind(matrix, matrix)
and use matrices rather than data.frames in while loops --- this is likely more efficient --- you can convert to data.frame at the end; - Set a seed for consistency, and set parameter values that result in sizeable outbreaks in In function documentation examples, otherwise examples often return empty
<epichains_tree>
s as the outbreak does not take off --- this might be realistic but not very informative to users; - Set seeds for consistency in the vignettes;
- There is often a mention of "chain sizes and lengths" - please explain the difference to users (and developers) if there is one. For example, when mentioning "a vector of chain sizes and lengths", the meaning is unclear - is it a single vector, and "size" and "length" are interchangeable, or is it two different vectors? Is the 'length' the same as the diameter of a network?,
- Contributing.md: Change references to {bpmodels}
Epichains classes
I don't think the classes add much and should be considered for removal, unless a suite of methods are planned that actually leverage the object signature.
Until they are removed:
- I would not allow the
<epichains_tree>
to inherit all the classes of the input data, as these could pull along classes such as<data.table>
,<tbl>
, etc. <epichains_summary>
validator: needs to be beefed up, currently only checks for the class<epichains_tree>
validator:- Move these checks, or add these checks, to the constructor as well
- Add checks to assert that other class members are of the expected types, this will be useful if/when you implement an
as.epichains_*()
method.
format.epichains_summary()
:- What is the vector being printed? Is it the vector of chain sizes?
- The maximum is not being printed as the list element name being accessed is wrong
Thanks for raising this. We have decided, after some brainstorming, to combine |
Co-authored-by: Hugo Gruson <[email protected]> Co-authored-by: Adam Kucharski <[email protected]>
Co-authored-by: Hugo Gruson <[email protected]>
Co-authored-by: Hugo Gruson <[email protected]>
Co-authored-by: Hugo Gruson <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #122 +/- ##
==========================================
+ Coverage 98.64% 99.11% +0.46%
==========================================
Files 8 8
Lines 518 562 +44
==========================================
+ Hits 511 557 +46
+ Misses 7 5 -2 ☔ View full report in Codecov by Sentry. |
This is a full package review in preparation for the release of {epichains} v0.1.0 on GitHub.
Note the following
Please self-assign as a reviewer under the "Reviewers" tab on the right side of this window.
Expectations
All kinds of reviews are welcome either as line comments or comment blocks:
Deadline
I'm on leave until Dec 11 and will attend to reviews when I return and before the Christmas break on Dec 21. Ideally, all reviews should be posted by Dec 12, but please let me know if you need more time.
What's next after your review?
Thanks and looking forward to your reviews.