Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathfinder fails for large case reports #728

Closed
jamesmbaazam opened this issue Jul 24, 2024 · 5 comments
Closed

pathfinder fails for large case reports #728

jamesmbaazam opened this issue Jul 24, 2024 · 5 comments

Comments

@jamesmbaazam
Copy link
Contributor

jamesmbaazam commented Jul 24, 2024

In my bid to get pathfinder running for the vignette in #695, I realised that it was quite likely failing because of the extremely large case reports in the simulated data (approx 40M at peak), so I decided to run a few data scenarios to see how the method performs with different case numbers. My suspicion has so far been confirmed, as the method performs well with case numbers between 100 and 10K, but starts to fail from there. Also notice that the credible intervals get narrower as the data magnitude increases. I am wondering why this is the case from the theoretical perspective, and if there is a way to improve the method to handle such large case numbers.

The log is quite rich, so I've attached it here for further discussion.
pathfinder_scenarios_log.txt

library(EpiNow2)
#> 
#> Attaching package: 'EpiNow2'
#> The following object is masked from 'package:stats':
#> 
#>     Gamma
library(data.table)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:data.table':
#> 
#>     transpose
library(ggplot2)

# Function to simulate data scenarios
simulate_epidemic <- function(days, date_start, peak_day, max_cases, spread, seed) {
  set.seed(seed)
  # Calculate the cases for each day
  days_vec <- seq_len(days)
  cases <- max_cases * exp(-((days_vec - peak_day)^2) / (2 * spread^2))
  
  # Create a data frame for the result
  epidemic_curve <- data.frame(
    date = as.Date(date_start) + days_vec,
    confirm = ceiling(cases)
  )
  return(epidemic_curve)
}

# Scenarios for epidemi peaks
peaks <- c(100, 100E1, 100E2, 100E3, 100E4, 100E5)

# Create version of epinow that always succeeds
safe_epinow <- purrr::safely(epinow)
# Define epinow() inputs
## Generation time
generation_time <- Gamma(
  shape = Normal(1.3, 0.3),
  rate = Normal(0.37, 0.09),
  max = 14
)
# Run the simulations
pf_results <- lapply(peaks, function(peak) {
  test_data <- simulate_epidemic(
    seed = 123,
    days = 365,
    date_start = "2020-02-22",
    peak_day = 125,
    max_cases = peak,
    spread = 20
  )
  
  safe_epinow(
    data = as.data.table(test_data)[confirm > 1], # only the data is changing
    generation_time = generation_time_opts(generation_time),
    stan = stan_opts(method = "pathfinder", backend = "cmdstanr"),
    verbose = FALSE,
    logs = NULL
  )
})
# Annotate the results
names(pf_results) <- paste0("peak_", peaks)

# Scenarios that failed will return TRUE
sapply(pf_results, function(x) is.null(x$result))
#>   peak_100  peak_1000 peak_10000 peak_1e+05 peak_1e+06 peak_1e+07 
#>      FALSE      FALSE      FALSE       TRUE       TRUE       TRUE

# Plot those that succeeded
sapply(pf_results, function(x) {
  if (!is.null(x$result)) {
    x$result$plots$summary
  }
})
#> $peak_100

#> 
#> $peak_1000

#> 
#> $peak_10000

#> 
#> $`peak_1e+05`
#> NULL
#> 
#> $`peak_1e+06`
#> NULL
#> 
#> $`peak_1e+07`
#> NULL

Created on 2024-07-24 with reprex v2.1.1

@seabbs
Copy link
Contributor

seabbs commented Aug 21, 2024

Oh interesting. I assume this is because of numerical overruns for very large numbers (i.e because the prior distribution is multiplier case values a incidence of 40 million might allow some prior values to be 10^3 * 40 million) I would guess we could improve this by improving the prior distribution

@seabbs
Copy link
Contributor

seabbs commented Aug 21, 2024

or usinng a different default model where you enforce some concept of population size.

@seabbs
Copy link
Contributor

seabbs commented Aug 21, 2024

Did you look at see if NUTs has the same scaling problems? I would guess it would do but maybe not? If not that is interesting

@jamesmbaazam
Copy link
Contributor Author

Sorry, I missed this message. I'll try with NUTS and see.

@jamesmbaazam
Copy link
Contributor Author

This did not fail with NUTS. I'm not sure I want to keep this issue open though, so closing until it comes up again. We don't advise that users use pathfinder as a solver so that should be the recommendation when users face issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants