Remove normal approximation #153

adamkucharski · 2024-06-24T19:52:53Z

Please check if the PR fulfills these requirements

I have read the CONTRIBUTING guidelines
A new item has been added to NEWS.md
Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
Checks have been run locally and pass

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

This addresses issues #152 and #151

What is the current behavior? (You can also link to an open issue here)

Instability with normal approximation in Ebola example

What is the new behavior (if this is a feature change)?

Normal approximation removed. This PR also updates tests for consistency with the removed functionality.

There are two additional changes:

Example in the README now focuses on the early outbreak stage, where the CFR bias is greater and hence the importance of {cfr} functionality is more clearly illustrated.
When we have the condition expected outcomes to date < total deaths, we now return NA for all estimates, to make it clearer to the user that this would not be a statistically valid calculation.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No.

Given instability of the normal approximation for many values (especially given asymmetric likelihood), and because binomial implementation is quick, this is being removed to ensure accurate outputs.

Output NA if total_outcomes<=total_deaths

Focus on early stage

adamkucharski · 2024-06-24T20:01:04Z

Note: need to review the test functions to ensure consistency with updated messages and outputs.

Bisaloo · 2024-07-11T11:23:03Z

Could you run styler::style_pkg() in the folder containing the package please? This will ensure consistency in the indentation and make it easier to read & review the code.

adamkucharski · 2024-07-11T13:35:35Z

I've run styler::style_pkg() (latest commit). Is this something we should also incorporate into this issue on PR guidelines? Would it also make easier for users to pass the linter checks?

Knitted vignettes appear to show equations OK once removed.

avallecam · 2024-07-11T21:56:38Z

sorry for the delayed reply. Just created this reprex to reproduce the README figure.

This PR successfully solved the issue of the point estimate and confidence interval. I would only add that this reprex also diagnoses that there are sections of the time series that do not generate an output, generating no estimates for certain date ranges, which are visible in the plot.

# pak::pak("epiverse-trace/cfr@remove-normal")

# Load package
library(cfr)
library(ggplot2)

# Calculate the static CFR without correcting for delays
cfr_static(data = ebola1976)
#>   severity_estimate severity_low severity_high
#> 1          0.955102    0.9210866     0.9773771


# Calculate the CFR without correcting for delays on each day of the outbreak
rolling_cfr_naive <- cfr_rolling(
  data = ebola1976
)
#> `cfr_rolling()` is a convenience function to help understand how additional data influences the overall (static) severity. Use `cfr_time_varying()` instead to estimate severity changes over the course of the outbreak.


# Calculate the rolling daily CFR while correcting for delays
rolling_cfr_corrected <- cfr_rolling(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
#> `cfr_rolling()` is a convenience function to help understand how additional data influences the overall (static) severity. Use `cfr_time_varying()` instead to estimate severity changes over the course of the outbreak.
#> Some daily ratios of total deaths to total cases with known outcome are below 0.01%: some CFR estimates may be unreliable.FALSE

# combine the data for plotting
rolling_cfr_naive$method <- "naive"
rolling_cfr_corrected$method <- "corrected"

data_cfr <- rbind(
  rolling_cfr_naive,
  rolling_cfr_corrected
)

# visualise both corrected and uncorrected rolling estimates
ggplot(data_cfr) +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_estimate, colour = method)
  ) +
  scale_colour_brewer(
    palette = "Dark2",
    labels = c("Corrected CFR", "Naive CFR"),
    name = NULL
  ) +
  scale_fill_brewer(
    palette = "Dark2"
  )
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).

^{Created on 2024-07-11 with reprex v2.1.0}

avallecam

In case NA may not be an expected outcome, we can use this reprex to check this behaviour after the corresponding fix. Here are the cfr_rolling and two cfr_static outputs with data until specific dates.

# pak::pak("epiverse-trace/cfr@remove-normal")

# Load package
library(cfr)
library(dplyr)
library(lubridate)

cfr_rolling(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
) %>% 
  filter(is.na(severity_estimate))
#> `cfr_rolling()` is a convenience function to help understand how additional data influences the overall (static) severity. Use `cfr_time_varying()` instead to estimate severity changes over the course of the outbreak.
#> Some daily ratios of total deaths to total cases with known outcome are below 0.01%: some CFR estimates may be unreliable.FALSE
#>          date severity_estimate severity_low severity_high
#> 1  1976-08-25                NA           NA            NA
#> 2  1976-09-28                NA           NA            NA
#> 3  1976-09-29                NA           NA            NA
#> 4  1976-09-30                NA           NA            NA
#> 5  1976-10-01                NA           NA            NA
#> 6  1976-10-02                NA           NA            NA
#> 7  1976-10-03                NA           NA            NA
#> 8  1976-10-04                NA           NA            NA
#> 9  1976-10-05                NA           NA            NA
#> 10 1976-10-06                NA           NA            NA
#> 11 1976-10-07                NA           NA            NA
#> 12 1976-10-08                NA           NA            NA
#> 13 1976-10-15                NA           NA            NA

ebola1976 %>% 
  filter(date<=ymd(19761001)) %>% 
  cfr_static(delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33))
#> Total deaths = 140 and expected outcomes = 134 so setting expected outcomes = NA. If we were to assume
#>         total deaths = expected outcomes, it would produce an estimate of 1.
#>   severity_estimate severity_low severity_high
#> 1                NA           NA            NA

ebola1976 %>% 
  filter(date<=ymd(19761015)) %>% 
  cfr_static(delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33))
#> Total deaths = 214 and expected outcomes = 214 so setting expected outcomes = NA. If we were to assume
#>         total deaths = expected outcomes, it would produce an estimate of 1.
#>   severity_estimate severity_low severity_high
#> 1                NA           NA            NA

^{Created on 2024-07-11 with reprex v2.1.0}

adamkucharski · 2024-07-12T06:11:43Z

Thanks for looking at this. Currently we have some situations where E(known outcomes) < deaths, and hence the likelihood as currently implemented isn't totally valid. One option would be to set E(known outcomes) = deaths in this situation (which some earlier versions of the code did). But thinking more, seemed clearest to return NA in this situation. I've outlined a potential longer-term solution in issue #154 (but this would require some more thought and potentially quite a lot of refactoring).

avallecam · 2024-07-12T08:59:30Z

That is interesting; thank you for adding an explicit explanation. In the meantime, would it be valid to get the message as a warning and, instead of NA, provide the latest or most recent estimated output at a given date?

adamkucharski · 2024-07-15T07:24:42Z

It would require quite a lot of additional refactoring to output the last valid estimate, as the README example is a showcase of multiple estimates at each point in time in a visualisation (i.e. a loop over cfr_static() ), rather than the output user will commonly interact with (i.e. a numerical estimate).

The current message is displayed above (e.g. Total deaths = 140 and expected outcomes = 134 so setting expected outcomes = NA. If we were to assume total deaths = expected outcomes, it would produce an estimate of 1.) But could edit if there's a better option?

avallecam · 2024-07-15T09:51:26Z

The current message is displayed above (e.g. Total deaths = 140 and expected outcomes = 134 so setting expected outcomes = NA. If we were to assume total deaths = expected outcomes, it would produce an estimate of 1.) But could edit if there's a better option?

The current message displayed is appropriate and specific. This reflects that this is produced in an extreme scenario, as described in #154. Given that this PR already solved the key issue, I'll move on with the approval.

As a complementary comment, I suggest adding to the message an explicit next step for the user. If, in an ongoing outbreak, we create a reproducible sitrep and suddenly get an NA result, then we can have a companion plot from cfr_rolling() to have the full view.

We could add sth like:

Use `cfr_rolling()` to understand how additional data influences the overall (static) severity.

avallecam

🚀 ready to merge given that this PR solved the main issue.

As commented in #153 (comment) this could optionally consider adding a next step in the output message for NA outputs from an extreme situation, as evaluated with data from ebola 1976, and until #154 manages to be solved.

adamkucharski · 2024-07-17T09:39:55Z

Thanks, have merged and will create an issue with the above rolling suggestion.

adamkucharski added 4 commits June 24, 2024 17:05

Remove normal approximation

ac17241

Given instability of the normal approximation for many values (especially given asymmetric likelihood), and because binomial implementation is quick, this is being removed to ensure accurate outputs.

Update documentation

de13f2a

Output NA if total_outcomes<=total_deaths

Update README example

b697a70

Focus on early stage

Fix typo and add news

5de4572

adamkucharski requested a review from avallecam June 24, 2024 19:52

adamkucharski marked this pull request as draft June 24, 2024 19:59

Update tests and linting

e79f954

adamkucharski marked this pull request as ready for review July 1, 2024 07:58

adamkucharski added 2 commits July 2, 2024 11:16

Address linting issues

fa4129f

Fix remaining linting

83917ff

adamkucharski marked this pull request as draft July 2, 2024 10:49

adamkucharski marked this pull request as ready for review July 2, 2024 10:49

adamkucharski and others added 2 commits July 11, 2024 14:24

Run styler

8fce7b7

Automatic readme update

6311a27

Remove pkgdown: as_is: true

cd3986d

Knitted vignettes appear to show equations OK once removed.

avallecam reviewed Jul 11, 2024

View reviewed changes

adamkucharski mentioned this pull request Jul 12, 2024

Difference in the time series of the delay-adjusted rolling Ebola results #152

Open

avallecam approved these changes Jul 15, 2024

View reviewed changes

adamkucharski merged commit 29ee12a into main Jul 17, 2024
8 checks passed

adamkucharski deleted the remove-normal branch July 17, 2024 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove normal approximation #153

Remove normal approximation #153

adamkucharski commented Jun 24, 2024 •

edited

Loading

adamkucharski commented Jun 24, 2024

Bisaloo commented Jul 11, 2024 •

edited

Loading

adamkucharski commented Jul 11, 2024

avallecam commented Jul 11, 2024

avallecam left a comment

adamkucharski commented Jul 12, 2024 •

edited

Loading

avallecam commented Jul 12, 2024

adamkucharski commented Jul 15, 2024

avallecam commented Jul 15, 2024 •

edited

Loading

avallecam left a comment

adamkucharski commented Jul 17, 2024

Remove normal approximation #153

Remove normal approximation #153

Conversation

adamkucharski commented Jun 24, 2024 • edited Loading

adamkucharski commented Jun 24, 2024

Bisaloo commented Jul 11, 2024 • edited Loading

adamkucharski commented Jul 11, 2024

avallecam commented Jul 11, 2024

avallecam left a comment

Choose a reason for hiding this comment

adamkucharski commented Jul 12, 2024 • edited Loading

avallecam commented Jul 12, 2024

adamkucharski commented Jul 15, 2024

avallecam commented Jul 15, 2024 • edited Loading

avallecam left a comment

Choose a reason for hiding this comment

adamkucharski commented Jul 17, 2024

adamkucharski commented Jun 24, 2024 •

edited

Loading

Bisaloo commented Jul 11, 2024 •

edited

Loading

adamkucharski commented Jul 12, 2024 •

edited

Loading

avallecam commented Jul 15, 2024 •

edited

Loading