Full package review for epichains v0.1.0 release #122

jamesmbaazam · 2023-12-04T23:12:09Z

This is a full package review in preparation for the release of {epichains} v0.1.0 on GitHub.

Note the following

Please self-assign as a reviewer under the "Reviewers" tab on the right side of this window.

Expectations

All kinds of reviews are welcome either as line comments or comment blocks:

Ideas for model enhancements
Enhancements to enable integration with other packages
Code improvements
Comments to improve usability
Grammatical issues

Deadline

I'm on leave until Dec 11 and will attend to reviews when I return and before the Christmas break on Dec 21. Ideally, all reviews should be posted by Dec 12, but please let me know if you need more time.

What's next after your review?

Only major comments will be addressed before the minor release, after which other issues will be addressed in future release cycles.
This PR will be closed when all relevant issues have been resolved. Closing this PR will not close any issues raised but will serve as a reference for raising issues to be resolved in subsequent pull requests.
Some issues have already been raised in earlier reviews and this full review is a move to solicit agreement or extra comments. If your comments relate to the already raised issues, please link them using appropriate GitHub features.

Thanks and looking forward to your reviews.

pratikunterwegs · 2023-12-05T08:59:29Z

Thanks @jamesmbaazam - will be reviewing this ~~Weds 6 Dec - Thurs 7 Dec~~ hoping to get this done tomorrow. Happy to coordinate with other reviewers.

Bisaloo

Thanks for you work on this! I think it's overall in good shape. I've made some comments inline relating to design, performance, documentation & good practices.

In terms of design & usability, I have to main concerns:

Some ... (e.g., in likelihood()) are passed to child functions several levels deep. This makes it very difficult to know which arguments this function is taking and where they are documented.
The nobs_offspring argument is a good example of that. It's required if you passed a distribution that epichains doesn't know but it's not really explicit from the function documentation. And as far as I can tell, you cannot arrive to the help page describing it from links in likelihood().
I am not yet completely sure about the solution for this. Maybe make these arguments explicit rather than implicit in the ellipses?
The fact that we have 3 very similar functions, with similar names, similar scopes and similar arguments feels confusing to me.
From a conceptual point of view, simulate_tree() feels (is?) the same as simulate_from_pop(pop = Inf) and simulate_summary() feels like a downstream analysis function that should be applied to the output of simulate_tree().
I understand they are separate for technical reasons. I understand that simulate_summary() exists to circumvent the memory issue that arises from the fact we're working with exponential growth processes. I understand simulate_from_pop() and simulate_tree() have slightly different implementations.
But it's still probably a problem to let technical considerations inform the design of our user interface. Even if we had to, e.g., internally dispatch to simulate_from_pop() or simulate_tree(), there may be value in having a single user-facing wrapper function.
Tangentially but on a related note, I don't follow why the output of simulate_from_pop() has fewer columns than the one from simulate_tree().

DESCRIPTION

NEWS.md

R/borel.r

R/stat_likelihoods.R

tests/testthat/test-likelihood.R

tests/testthat/test-stat_likelihoods.R

vignettes/projecting_incidence.Rmd

R/simulate.r

R/epichains.R

adamkucharski

Thanks for putting all this together. Have added some comments mostly on vignettes – in particular, think there are some tweaks we could made to improve entry point for new users and highlight complementary aspects of other packages.

vignettes/interventions.Rmd

vignettes/epichains.Rmd

vignettes/branching_process_literature.Rmd

README.Rmd

vignettes/epichains.Rmd

vignettes/projecting_incidence.Rmd

pratikunterwegs

Thanks @jamesmbaazam for opening this review. I think the package looks mostly alright. I haven't looked into the tests and the vignettes, and I'm laying out some general points below.

General

I wonder whether the three simulate_*() functions can be rolled into one. I'm not able to tell why simulate_tree() and simulate_tree_from_pop() are different. It should be possible to have both an initial number of cases, and a finite population size.
The simulate_summary() function can probably be removed in favour of a summary.epichains() method for other simulate_*() outputs. This would probably cut down on the code and I don't feel that simulate_tree() is too slow to run.
I am not convinced that this package benefits from S3 classes, as they are very thin wrappers around standard structures. One good reason to have a custom class is for methods such as summary.epichains() (see point above), or methods to link it with {igraph}/{tidygraph} or {epicontacts}. This could still be done by returning data.frames in an edgelist format (from, to, edge strength) which can be ingested by e.g. igraph::graph_from_data_frame() - this opens the door to easy plotting.
The vignette on interventions should be rephrased or removed, as the examples are not modelling reactive interventions, but rather scenarios with different pathogen or public-health parameters. It should be possible to implement an intervention that reduces a model parameter between some timepoints (see <rate_interventions> in {epidemics}), but this would require you to keep track of the model time separately from the infection's generation time, and move the code in a more (discrete?) time-based direction.

General technical

Bring package test coverage up to 100%;
Update the WORDLIST and enable R CMD check failure on spelling errors;
Run {styler};
Ensure that the GH workflows are up to date with {packagetemplate}, I haven't checked if they are;
Go through the spring cleaning tasklist and implement any fixes needed; see also the {usethis} release checklist: e.g. add examples for all exported functions;
Suggest numeric vectors should never have infinite values, check all instances of checkmate::assert_*() and especially do not let the upper value of numerics be Inf;
Set all optional arguments to NULL rather than missing, check instances of this, e.g. R/likelihood.R;
Mention the type of all function arguments and returns in the documentation. E.g. for likelihood(), @param chains A numeric vector representing chain sizes or lengths.;
Be consistent with the use of return();
Prefer the use of checkmate::assert_*() or stopifnot(checkmate::test_*()) rather than if(!checkmate::test_*()) stop();
Avoid comments inside temporary functions generated within other function bodies;
Prefer rbind(matrix, matrix) and use matrices rather than data.frames in while loops --- this is likely more efficient --- you can convert to data.frame at the end;
Set a seed for consistency, and set parameter values that result in sizeable outbreaks in In function documentation examples, otherwise examples often return empty <epichains_tree>s as the outbreak does not take off --- this might be realistic but not very informative to users;
Set seeds for consistency in the vignettes;
There is often a mention of "chain sizes and lengths" - please explain the difference to users (and developers) if there is one. For example, when mentioning "a vector of chain sizes and lengths", the meaning is unclear - is it a single vector, and "size" and "length" are interchangeable, or is it two different vectors? Is the 'length' the same as the diameter of a network?,
Contributing.md: Change references to {bpmodels}

Epichains classes

I don't think the classes add much and should be considered for removal, unless a suite of methods are planned that actually leverage the object signature.
Until they are removed:

I would not allow the <epichains_tree> to inherit all the classes of the input data, as these could pull along classes such as <data.table>, <tbl>, etc.
<epichains_summary> validator: needs to be beefed up, currently only checks for the class
<epichains_tree> validator:
- Move these checks, or add these checks, to the constructor as well
- Add checks to assert that other class members are of the expected types, this will be useful if/when you implement an as.epichains_*() method.
format.epichains_summary():
- What is the vector being printed? Is it the vector of chain sizes?
- The maximum is not being printed as the list element name being accessed is wrong

R/simulate.r

R/epichains-package.R

R/epichains.R

R/helpers.R

R/likelihood.R

R/simulate.r

jamesmbaazam · 2023-12-12T12:39:48Z

The fact that we have 3 very similar functions, with similar names, similar scopes and similar arguments feels confusing to me.
From a conceptual point of view, simulate_tree() feels (is?) the same as simulate_from_pop(pop = Inf) and simulate_summary() feels like a downstream analysis function that should be applied to the output of simulate_tree().
I understand they are separate for technical reasons. I understand that simulate_summary() exists to circumvent the memory issue that arises from the fact we're working with exponential growth processes. I understand simulate_from_pop() and simulate_tree() have slightly different implementations.
But it's still probably a problem to let technical considerations inform the design of our user interface. Even if we had to, e.g., internally dispatch to simulate_from_pop() or simulate_tree(), there may be value in having a single user-facing wrapper function.
Tangentially but on a related note, I don't follow why the output of simulate_from_pop() has fewer columns than the one from simulate_tree().

Thanks for raising this. We have decided, after some brainstorming, to combine simulate_tree() and simulate_tree_from_pop() into a single function and make simulate_summary() internal (with a better name). I will detail this in the design document, which has been in a draft state for a while now. Your review will be welcomed.

Co-authored-by: Hugo Gruson <[email protected]> Co-authored-by: Adam Kucharski <[email protected]>

Co-authored-by: Hugo Gruson <[email protected]>

codecov-commenter · 2023-12-15T17:24:54Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (30b68a0) 98.64% compared to head (41adfb9) 99.11%.
Report is 42 commits behind head on main.

❗ Current head 41adfb9 differs from pull request most recent head a3b8022. Consider uploading reports for the commit a3b8022 to get more accurate results

Files	Patch %	Lines
R/epichains.R	98.50%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #122      +/-   ##
==========================================
+ Coverage   98.64%   99.11%   +0.46%     
==========================================
  Files           8        8              
  Lines         518      562      +44     
==========================================
+ Hits          511      557      +46     
+ Misses          7        5       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

close #127

jamesmbaazam added the Full package review label Dec 4, 2023

jamesmbaazam added this to the v0.1.0 milestone Dec 4, 2023

jamesmbaazam requested review from pratikunterwegs and Bisaloo December 4, 2023 23:26

Bisaloo reviewed Dec 7, 2023

View reviewed changes

Bisaloo reviewed Dec 8, 2023

View reviewed changes

adamkucharski self-requested a review December 8, 2023 08:27

adamkucharski approved these changes Dec 8, 2023

View reviewed changes

pratikunterwegs reviewed Dec 8, 2023

View reviewed changes

jamesmbaazam mentioned this pull request Dec 12, 2023

Make required arguments passed through ... explicit #125

Open

jamesmbaazam and others added 16 commits December 15, 2023 15:38

Apply suggestions from code review

33405ae

Co-authored-by: Hugo Gruson <[email protected]> Co-authored-by: Adam Kucharski <[email protected]>

Automatic readme update

95d3a65

Apply suggestions: Reword ... arg description

dc59cfb

Co-authored-by: Hugo Gruson <[email protected]>

Clarify phrase about how generation_time can be specified

721468d

Co-authored-by: Hugo Gruson <[email protected]>

Remove sprintf call from stop() message

33c2344

Co-authored-by: Hugo Gruson <[email protected]>

Remove redundant check

08fc5fa

Generate documentation

c2889c3

Rename count variable and convert result to tibble

0070ebe

Remove unneccesary lambda from lapply

6aa9d4a

Label count column in raw data as cases.

f0ba812

Revert use of setdiff

468312c

Generate docs

0041c98

Fix spelling and update wordlist. Fixes #144.

1640a77

Linting

b3a35a4

Fix test

22ad47a

Update snapshots

b80c1cf

jamesmbaazam force-pushed the review branch from e2cdb0a to b80c1cf Compare December 15, 2023 17:20

jamesmbaazam and others added 7 commits December 15, 2023 19:41

Remove unnecessary badges to fix #164

8264254

Delete CITATION file to fix #129

794a145

Use dev version

985c3f3

Clean up NEWS

a2f1813

Move truncdist to Imports to fix #138

0d2733b

Remove unnecessary packages from Suggests and Remotes to close #126 and

caa78d6

close #127

Automatic readme update

a3b8022

jamesmbaazam closed this Dec 15, 2023

jamesmbaazam reopened this Dec 15, 2023

jamesmbaazam merged commit f5eed5b into main Dec 15, 2023
8 checks passed

jamesmbaazam deleted the review branch December 15, 2023 20:07

jamesmbaazam mentioned this pull request Dec 15, 2023

Change scripts with .r extension to .R for consistency #135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full package review for epichains v0.1.0 release #122

Full package review for epichains v0.1.0 release #122

jamesmbaazam commented Dec 4, 2023 •

edited

Loading

pratikunterwegs commented Dec 5, 2023 •

edited

Loading

Bisaloo left a comment

adamkucharski left a comment

pratikunterwegs left a comment

jamesmbaazam commented Dec 12, 2023

codecov-commenter commented Dec 15, 2023 •

edited

Loading

Full package review for epichains v0.1.0 release #122

Full package review for epichains v0.1.0 release #122

Conversation

jamesmbaazam commented Dec 4, 2023 • edited Loading

Note the following

Expectations

Deadline

What's next after your review?

pratikunterwegs commented Dec 5, 2023 • edited Loading

Bisaloo left a comment

Choose a reason for hiding this comment

adamkucharski left a comment

Choose a reason for hiding this comment

pratikunterwegs left a comment

Choose a reason for hiding this comment

General

General technical

Epichains classes

jamesmbaazam commented Dec 12, 2023

codecov-commenter commented Dec 15, 2023 • edited Loading

Codecov Report

jamesmbaazam commented Dec 4, 2023 •

edited

Loading

pratikunterwegs commented Dec 5, 2023 •

edited

Loading

codecov-commenter commented Dec 15, 2023 •

edited

Loading