Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.1.14 #191

Merged
merged 7 commits into from
Mar 11, 2024
Merged

Release 0.1.14 #191

merged 7 commits into from
Mar 11, 2024

Conversation

plietar
Copy link
Member

@plietar plietar commented Mar 8, 2024

No description provided.

By default, when restoring the simulation state all previous schedule on
events are cleared and restored from the saved state. This goes against
use cases that wish to resume a simulation with different intervention
schedules to compare effects. In those use cases, a different
initialization sequence is used when creating the simulation, and we do
not want that to be cleared and overwritten.

The new `restore` flag, when set to false, overrides this default
behaviour and the state of an Event is (mostly) unaffected by a restore.
Thanks to this, a new event schedule, that is unrelated to the schedule
of the original run, can be configured.
I changed a function signature in mrc-ide#180, but had not
regenerated the `.Rd` files.
Couple of hotspots found while running malariasimulation with larger
population size. This PR gives a 10% speedup when using population size
of 1M.

The most suprising change comes from a modification to the signature of
Rcpp-exported functions. Some of them used to accept a `const
std::vector` as an argument. However when doing so Rcpp creates an extra
unexpected copy in the wrapping code. We end up with two copies, once
from R memory to an std::vector in the generated code, and a second one
when the method is called. If argument is non-const, Rcpp does not
create the intermediate object and the copy is avoided.

This behaviour comes from the difference between the `InputParameter`
and `ConstInputParameter` classes in Rcpp's [InputParameter.h] file.

Other changes include:
- Return a constant reference from `NumericVariable::get_values`,
  instead of a copy.
- Use `std::move` inside `NumericVariable::queue_update` instead of a
  copy.
- Use `reserve` and `push_back` on a vector instead of filling it with
  zeros to avoid an unnecessary memset.

With this, most operations that work with values require only one copy,
where they used to require two or three.

There are a couple more places that could be optimised further by
working directly with R vectors instead of `std::vector`, but they are
more intrusive changes and don't appear as significantly in
malariasimulation profiles anyway. I've left these out for now.

[InputParameter.h]: https://github.com/RcppCore/Rcpp/blob/c63bae6dea4e3994e6f334612c7837fa18ac1e06/inst/include/Rcpp/InputParameter.h
R6 object have significant overhead, especially when instantiating the
objects. While this is acceptable for long-lived objects such as events
or variables, bitsets are created and destroyed very regularly during
simulations.

We can replace our use of an R6 class for `Bitset` with named lists that
are intended to look and feel just like the original API, but which
significant performance improvement. The reference semantics provided by
R6 don't matter in our case, since all mutability happens behind the
external pointer.

On malariasimulation, I get a 30-35% performance improvement when using
this new implementation of Bitset on population sizes under 10k, and
about 10% speedup at 100k.

The object-oriented named list based interface still adds a bit of
overhead compared to using the externalptr and Rcpp functions directly,
but doing so requires intrusive changes in the use site.
The existing bitset sampling implementation works by using a binomial
distribution to decide how many bits to keep, randomly chooses the
indices of those bits, sorts the vector of indices and finally iterates
over all the bits one by one to clear those not contained in the vector.

This can be very inefficient, in particular when sampling over large
bit sets with a very low sampling rate. In that case, the list of
indices to keep is roughly as large as the bitset itself, and sorting it
requires O(nlog(n)) time, which ends up being significant.

Additionally, walking over every single bit, set or not, to be cleared
or not, is pretty inefficient as well.

This commit optimizes the implementation through a few methods:
- Instead of sampling and sorting indices to keep, it randomly samples
  the size of the gaps between two succesful bernouilli trials.
  This was inspired by the [FastBernoulliTrial] class.
- When the sampling rate is higher than 1/2, it flips the sampling logic
  and uses the gaps between two unsuccessful trials, minimizing the
  number of loop iterations.
- Finally, in order to take full advantage of the gap lengths, it is
  able to quickly scan through the bitset to skip a given number of set
  bits, calling popcnt once per words rather than looking at each bit,
  and can clear entire ranges of bits at once by overwriting entire
  words, rather than masking bits one by one.

This implementation is faster than the previous one for the entire
parameter space. The difference is most drastic for very low sampling
rates where the new implementation is more than two orders of magnitude
faster.

[FastBernoulliTrial]: https://searchfox.org/mozilla-central/rev/a6d25de0c706dbc072407ed5d339aaed1cab43b7/mfbt/FastBernoulliTrial.h
)

When a queued update replaces an entire variable, we can re-use the
vector contained in the update by moving it over, instead of copying it.

This situation is actually pretty common in malariasimulation, where
exponentially decaying variables are entirely overwritten at each time
step. This change speeds up the simulation by about 5% for a 1M
population.
@plietar plietar requested a review from giovannic March 8, 2024 13:59
@plietar plietar changed the title Release/0.1.14 Release 0.1.14 Mar 8, 2024
Copy link

codecov bot commented Mar 8, 2024

Codecov Report

Attention: Patch coverage is 96.93878% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 96.12%. Comparing base (576fe29) to head (4bb3471).
Report is 4 commits behind head on master.

Files Patch % Lines
inst/include/IterableBitset.h 95.65% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #191      +/-   ##
==========================================
- Coverage   96.26%   96.12%   -0.14%     
==========================================
  Files          36       36              
  Lines        1819     1832      +13     
==========================================
+ Hits         1751     1761      +10     
- Misses         68       71       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@plietar plietar merged commit a3ce0e0 into mrc-ide:master Mar 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants