Sum loglikelihoods before exponentiating #105

Bisaloo · 2023-11-08T10:51:08Z

Order of operations at the end of likelihood() has been switched, which highlighted the opportunity to use a vectorized exp() call in some cases.

The new code should be:

Faster (when `log = FALSE, individual = FALSE`)

library(microbenchmark)
library(epichains)

# randomly generate 20 chains of size 1 to 10
chain_sizes <- sample(1:10, 100, replace = TRUE)

microbenchmark(
  likelihood(
    chains = chain_sizes, statistic = "size",
    offspring_dist = "pois", nsim_obs = 1e4, lambda = 0.5, obs_prob = 0.5
  ),
  likelihood_old(
    chains = chain_sizes, statistic = "size",
    offspring_dist = "pois", nsim_obs = 1e4, lambda = 0.5, obs_prob = 0.5
  ),
  times = 1e3
)
#> Unit: milliseconds
#>                                                                                                                                    expr
#>      likelihood(chains = chain_sizes, statistic = "size", offspring_dist = "pois",      nsim_obs = 10000, lambda = 0.5, obs_prob = 0.5)
#>  likelihood_old(chains = chain_sizes, statistic = "size", offspring_dist = "pois",      nsim_obs = 10000, lambda = 0.5, obs_prob = 0.5)
#>       min       lq     mean   median       uq      max neval
#>  273.8103 312.9286 341.6483 332.5475 357.6596 609.6394  1000
#>  277.9916 312.7059 340.9575 332.5310 357.6125 589.5059  1000

microbenchmark(
  likelihood(
    chains = chain_sizes, statistic = "size",
    offspring_dist = "pois", nsim_obs = 1e4, lambda = 0.5, log = FALSE, obs_prob = 0.5
  ),
  likelihood_old(
    chains = chain_sizes, statistic = "size",
    offspring_dist = "pois", nsim_obs = 1e4, lambda = 0.5, log = FALSE, obs_prob = 0.5
  ),
  times = 1e3
)
#> Unit: milliseconds
#>                                                                                                                                                 expr
#>      likelihood(chains = chain_sizes, statistic = "size", offspring_dist = "pois",      nsim_obs = 10000, lambda = 0.5, log = FALSE, obs_prob = 0.5)
#>  likelihood_old(chains = chain_sizes, statistic = "size", offspring_dist = "pois",      nsim_obs = 10000, lambda = 0.5, log = FALSE, obs_prob = 0.5)
#>       min       lq     mean   median       uq      max neval
#>  280.8126 318.0650 351.5412 339.0232 366.4869 712.5673  1000
#>  295.8321 327.4035 363.9978 348.9071 373.9444 685.0759  1000

microbenchmark(
  likelihood(
    chains = chain_sizes, statistic = "size",
    offspring_dist = "pois", nsim_obs = 1e4, lambda = 0.5, log = FALSE,
    individual = TRUE, obs_prob = 0.5
  ),
  likelihood_old(
    chains = chain_sizes, statistic = "size",
    offspring_dist = "pois", nsim_obs = 1e4, lambda = 0.5, log = FALSE,
    individual = TRUE, obs_prob = 0.5
  ),
  times = 1e3
)
#> Unit: milliseconds
#>                                                                                                                                                                         expr
#>      likelihood(chains = chain_sizes, statistic = "size", offspring_dist = "pois",      nsim_obs = 10000, lambda = 0.5, log = FALSE, individual = TRUE,      obs_prob = 0.5)
#>  likelihood_old(chains = chain_sizes, statistic = "size", offspring_dist = "pois",      nsim_obs = 10000, lambda = 0.5, log = FALSE, individual = TRUE,      obs_prob = 0.5)
#>       min       lq     mean   median       uq      max neval
#>  287.1338 308.0223 334.6865 335.8227 355.4666 544.8965  1000
#>  286.0165 306.2436 334.0689 333.8073 353.5573 664.2069  1000

^{Created on 2023-11-06 with reprex v2.0.2}

More robust

It is often problematic to work with very small values, such as the ones produced by exponentiating a large negative number. This can lead to underflows.
This is one of the reasons why we use log likelihood, as discussed in this SO thread.

Simpler

An ifelse() has been removed.

Note

This seems like a minimal improvement so submitting this as a draft PR to get opinions. I'm fine with whatever the outcome of this PR is (closed without merging or merged).

R/likelihood.R

jamesmbaazam · 2023-11-08T18:09:34Z

Thanks @Bisaloo. This was raised in #65 so this PR is definitely welcome. I'll review the changes.

codecov-commenter · 2023-11-10T15:35:16Z

Codecov Report

Merging #105 (58146a4) into main (ee2b201) will decrease coverage by 0.24%.
The diff coverage is 83.33%.

❗ Current head 58146a4 differs from pull request most recent head 9a748f3. Consider uploading reports for the commit 9a748f3 to get more accurate results

@@            Coverage Diff             @@
##             main     #105      +/-   ##
==========================================
- Coverage   98.82%   98.58%   -0.24%     
==========================================
  Files           8        8              
  Lines         424      425       +1     
==========================================
  Hits          419      419              
- Misses          5        6       +1

Files	Coverage Δ
R/likelihood.R	`98.33% <83.33%> (-1.67%)`	⬇️

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

jamesmbaazam · 2023-11-10T15:45:39Z

Thanks for the PR @Bisaloo. LGTM. I'm going to rebase and merge now.

Bisaloo marked this pull request as draft November 8, 2023 10:51

Bisaloo commented Nov 8, 2023

View reviewed changes

R/likelihood.R Show resolved Hide resolved

jamesmbaazam marked this pull request as ready for review November 10, 2023 09:19

Sum loglikelihoods before exponentiating

9a748f3

jamesmbaazam force-pushed the sum-before-exp branch from a6bd276 to 9a748f3 Compare November 10, 2023 15:32

jamesmbaazam merged commit fd81398 into main Nov 10, 2023
7 checks passed

jamesmbaazam deleted the sum-before-exp branch November 10, 2023 15:46

jamesmbaazam mentioned this pull request Nov 10, 2023

Improve numerical stability of joint-likelihood calculation #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sum loglikelihoods before exponentiating #105

Sum loglikelihoods before exponentiating #105

Bisaloo commented Nov 8, 2023 •

edited

Loading

jamesmbaazam commented Nov 8, 2023

codecov-commenter commented Nov 10, 2023

jamesmbaazam commented Nov 10, 2023 •

edited

Loading

Sum loglikelihoods before exponentiating #105

Sum loglikelihoods before exponentiating #105

Conversation

Bisaloo commented Nov 8, 2023 • edited Loading

Faster (when log = FALSE, individual = FALSE)

More robust

Simpler

Note

jamesmbaazam commented Nov 8, 2023

codecov-commenter commented Nov 10, 2023

Codecov Report

jamesmbaazam commented Nov 10, 2023 • edited Loading

Bisaloo commented Nov 8, 2023 •

edited

Loading

Faster (when `log = FALSE, individual = FALSE`)

jamesmbaazam commented Nov 10, 2023 •

edited

Loading