Split up epichains classes #107

jamesmbaazam · 2023-11-08T17:35:05Z

This PR closes #66, closes #78, and closes #79.

It does this by:

splitting up the <epichains> class into <epichains_tree> and epichains_summary, which inherit from <data.frame> and <vector> respectively.
adding a constructor, helper, and validator for each class.
removing the <aggregate_epichains_df> class as it is no longer deemed necessary.
providing each new class with its own print() and summary() methods.

codecov-commenter · 2023-11-08T17:38:09Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (0a28674) 98.63% compared to head (ca5eec9) 98.90%.

❗ Current head ca5eec9 differs from pull request most recent head 0c89a08. Consider uploading reports for the commit 0c89a08 to get more accurate results

Files	Patch %	Lines
R/epichains.R	98.36%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
+ Coverage   98.63%   98.90%   +0.27%     
==========================================
  Files           8        8              
  Lines         511      549      +38     
==========================================
+ Hits          504      543      +39     
+ Misses          7        6       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sbfnk

Looks good to me. I'm still slightly concerned if it's confusing to have both simulate_tree and simulate_summary. The main reason for simulate_summary to exist is that it's faster than simulate_tree and thus it's useful to have internally especially when approximating likelihoods where it may be run ~1,000,000s of times within e.g. an MCMC. But for the user I wonder if there is ever a situation where this would matter, and where we could otherwise have just a simulate function that returns the tree.

But I'm not sure, and I'm not sure what the best solution is. Perhaps something to come out when others review the package.

R/epichains.R

sbfnk · 2023-11-30T14:08:31Z

If sticking with the current setup (which may well be best) then I think it would be a good idea for summary.epichains_tree() to return an <epichains_summary> as suggested by @pratikunterwegs in #79 (comment)

jamesmbaazam · 2023-11-30T14:47:44Z

Looks good to me. I'm still slightly concerned if it's confusing to have both simulate_tree and simulate_summary. The main reason for simulate_summary to exist is that it's faster than simulate_tree and thus it's useful to have internally especially when approximating likelihoods where it may be run ~1,000,000s of times within e.g. an MCMC. But for the user I wonder if there is ever a situation where this would matter, and where we could otherwise have just a simulate function that returns the tree. But I'm not sure, and I'm not sure what the best solution is. Perhaps something to come out when others review the package.

I have a few thoughts on alternatives.

I know having these two functions leads to the issue in Code duplication as a result of splitting chain_sim() #44 and the solution could be having simulate_summary() as just a summarising helper for the results of simulate/simulate_tree(). This might also affect our decision on whether to take on data.table as a dependency to fix the benchmarking issue in chain_sim uses inefficient rbinding of data frames #8 that is being fixed in not for merging: benchmark options for binding data frames in simulation #114. We could use <data.table> for the wrangling in the summary helper that I am proposing. Downside here is users will always have to simulate things they don't need using simulate() before obtaining the summaries.
We could leave simulate_summary() but add the clarification, "The main reason for simulate_summary to exist is that it's faster than simulate_tree and thus it's useful to have internally especially when approximating likelihoods where it may be run ~1,000,000s of times within e.g. an MCMC". Downside is Code duplication as a result of splitting chain_sim() #44.

I'm inclined to go with this option 1.

sbfnk · 2023-11-30T14:50:46Z

I agree. One thing I've been wondering is how much of a speed difference there actually is. Perhaps we could add simulate_summary to the benchmark in #114?

jamesmbaazam · 2023-11-30T15:17:43Z

Definitely much faster. See #114 (comment).

sbfnk · 2023-12-01T09:02:43Z

Interesting! Another option would be

A kind of half-way house between the two: rename simulate_tree to simulate and make simulate_summary internal only and used in

epichains/R/stat_likelihoods.R

Line 183 in 5633a04

dist <- simulate_summary(

which has the benefit that the user interface is simpler but the downside that it may make some use cases slower so the question is how much we believe that others will want to simulate summaries only e.g. for inference.

Still doesn't help with the code duplication unless there is some of it which could be turned into a function.

sbfnk · 2023-12-01T09:07:37Z

How about

We keep things as they are but rename simulate_tree to simulate and then make summary(simulate) return the same type of output as simulate_summary? This would simplify the hierarchy and hopefully make it clear that simulate_summary is a faster shortcut to simulating the summary directly if that's all we care about. It doesn't help with Code duplication as a result of splitting chain_sim() #44 but perhaps we can just accept that this (bit of) duplication is unavoidable.

jamesmbaazam · 2023-12-01T10:53:55Z

Am I wrong for interpreting #107 (comment) to mean option 4? 🤔

As in, "simulate()" returns an <epichains_tree> object and summary.epichains_tree returns an <epichains_summary> object, which is the same as simulate_summary() returns <epichains_summary>.

Also, will we be masking stats::simulate()?

sbfnk · 2023-12-01T11:14:05Z

Am I wrong for interpreting #107 (comment) to mean option 4? 🤔

Yes, they're basically the same (+rename)

Also, will we be masking stats::simulate()?

Ah yes. We could extend it (as it's a generic) but we'd have to construct an object to simulate from first.

Given all of this my inclination would be to just go ahead with #107 (comment) and keep simulate_tree as is after all. But the decision is not clear-cut. What do you think?

jamesmbaazam · 2023-12-01T13:47:19Z

Given all of this my inclination would be to just go ahead with #107 (comment) and keep simulate_tree as is after all. But the decision is not clear-cut. What do you think?

Agreed. Looking at the stats::simulate() generic, it looks more useful as a "scenario" simulation function.

I will go ahead with #107 (comment).

Thanks for helping to brainstorm.

pratikunterwegs

We could leave simulate_summary() but add the clarification, "The main reason for simulate_summary to exist is that it's faster than simulate_tree and thus it's useful to have internally especially when approximating likelihoods where it may be run ~1,000,000s of times within e.g. an MCMC". Downside is #44.

Just looking into this PR as the instigator of #79 --- as an external developer, I haven't been able to understand the difference between simulate_tree() and simulate_summary(). Perhaps I'm confused because they both return objects with some shared inheritance, and I'm conditioned by R syntax to assume that a 'summary' is a condensed version of another object.

Would it help to rename simulate_summary() to say, sample_tree_metrics()? This would then reserve 'simulate' for code that gives the tree structure. However, running summary(simulate_tree()) should also give the same-ish output as sample_tree_metrics() for the same ntrees and offspring_dist.

Also, would it help to pick between 'tree' and 'chain' and use one, or do they mean different things?

jamesmbaazam · 2023-12-04T10:30:36Z

Would it help to rename simulate_summary() to say, sample_tree_metrics()? This would then reserve 'simulate' for code that gives the tree structure. However, running summary(simulate_tree()) should also give the same-ish output as sample_tree_metrics() for the same ntrees and offspring_dist.

It's not really sampling. The current simulate_summary() function is running a stripped down version of simulate_tree() that only returns the chain_statistic of interest without tracking related information. Speaking out loud now, I think it's probably better to rename it to simulate_statistic(). I mentioned in the second bullet of #79 that the name simulate_summary() would need some reconsideration.

Co-authored-by: Sebastian Funk <[email protected]>

jamesmbaazam force-pushed the split-epichains-classes branch from 1e568e2 to f037500 Compare November 28, 2023 17:53

jamesmbaazam marked this pull request as ready for review November 28, 2023 17:58

jamesmbaazam requested a review from sbfnk November 28, 2023 18:22

sbfnk requested changes Nov 30, 2023

View reviewed changes

pratikunterwegs reviewed Dec 4, 2023

View reviewed changes

jamesmbaazam and others added 14 commits December 4, 2023 11:55

Use helper functions to create objects

fae6c1f

Remove epichains_aggregate_df class

1a1b9e3

Use new validation function

1564d35

Clean up documentation of aggregate method

f4b4ffc

Condense head and tail methods for new class

69e9cb2

Add constructor and helper for epichains_tree class

c3a4a4a

Add constructor for epichains_summary class

fbc4134

Add print and format methods for epichains_tree

8d32e18

Add print and format methods for epichains_summary class

e00182e

Add summary method for epichains_tree and epichains_summary

b10974f

Add validation checkers for the two classes

50c6edc

Generate new NAMESPACE

8eb6fbc

Add comments

bfcb19a

Bind right class to aggregate method

76fd61f

jamesmbaazam and others added 23 commits December 4, 2023 11:57

Loosen assertion for stat_max

ca581e7

Update snapshot tests

c47c58e

Bind epichains_tree class to aggregate method

cdca312

Styling to fix lintr issues

6d172db

Linting: Fix indentation

67f9294

Remove intvn_mean_reduction argument

a62c319

Improve documentation

eb81100

Remove intvn_mean_reduction

07b830a

Remove trailing comma

7a13621

Apply suggestions from code review

69a8611

Co-authored-by: Sebastian Funk <[email protected]>

Rename chains_run to nchains

bd85c5d

Remove hardcoded superclass

f775ed5

Rename grouping_var to by

b883168

Rename nchains to ntrees

21d8bed

Revise function docs

b94fadb

Use new column names

e4cf58b

Reword function documentation

27af1fd

Rename variables for clarity

fea39e4

Remove rownames (got lost in merge conflicts)

9deb88c

Fix a doc

59a74e8

Remove unwanted variable

6e04463

Replace chain with trees in comments to remove confusion

93ab5e4

Update snaps

febd504

jamesmbaazam force-pushed the split-epichains-classes branch from 0c89a08 to febd504 Compare December 4, 2023 13:38

jamesmbaazam mentioned this pull request Dec 4, 2023

Name options for simulate_summary() #118

Closed

Return offspring as part of object

f46b001

jamesmbaazam mentioned this pull request Dec 4, 2023

summary.epichains_tree() should return an <epichains_summary> #119

Closed

jamesmbaazam merged commit e2c27a2 into main Dec 4, 2023

jamesmbaazam deleted the split-epichains-classes branch December 4, 2023 18:38

jamesmbaazam mentioned this pull request Dec 14, 2023

Full package review for epichains v0.1.0 release #122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split up epichains classes #107

Split up epichains classes #107

jamesmbaazam commented Nov 8, 2023 •

edited

Loading

codecov-commenter commented Nov 8, 2023 •

edited

Loading

sbfnk left a comment

sbfnk commented Nov 30, 2023

jamesmbaazam commented Nov 30, 2023 •

edited

Loading

sbfnk commented Nov 30, 2023

jamesmbaazam commented Nov 30, 2023 •

edited

Loading

sbfnk commented Dec 1, 2023

sbfnk commented Dec 1, 2023

jamesmbaazam commented Dec 1, 2023 •

edited

Loading

sbfnk commented Dec 1, 2023

jamesmbaazam commented Dec 1, 2023 •

edited

Loading

pratikunterwegs left a comment

jamesmbaazam commented Dec 4, 2023 •

edited

Loading

Split up epichains classes #107

Split up epichains classes #107

Conversation

jamesmbaazam commented Nov 8, 2023 • edited Loading

codecov-commenter commented Nov 8, 2023 • edited Loading

Codecov Report

sbfnk left a comment

Choose a reason for hiding this comment

sbfnk commented Nov 30, 2023

jamesmbaazam commented Nov 30, 2023 • edited Loading

sbfnk commented Nov 30, 2023

jamesmbaazam commented Nov 30, 2023 • edited Loading

sbfnk commented Dec 1, 2023

sbfnk commented Dec 1, 2023

jamesmbaazam commented Dec 1, 2023 • edited Loading

sbfnk commented Dec 1, 2023

jamesmbaazam commented Dec 1, 2023 • edited Loading

pratikunterwegs left a comment

Choose a reason for hiding this comment

jamesmbaazam commented Dec 4, 2023 • edited Loading

jamesmbaazam commented Nov 8, 2023 •

edited

Loading

codecov-commenter commented Nov 8, 2023 •

edited

Loading

jamesmbaazam commented Nov 30, 2023 •

edited

Loading

jamesmbaazam commented Nov 30, 2023 •

edited

Loading

jamesmbaazam commented Dec 1, 2023 •

edited

Loading

jamesmbaazam commented Dec 1, 2023 •

edited

Loading

jamesmbaazam commented Dec 4, 2023 •

edited

Loading