Reduce memory consumption during discrete pruning #1898
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When running pruning, especially when we have a large number of discrete variables, the code would try to collect all the values in a
std::vector
. In the case we had >20 discrete variables with cardinality 2 each, this would lead to vectors of size greater than a million, thus taking up a large amount of memory.Once the allocated heap memory exceeded a limit, the OS would automatically kill the process.
To address this, I implemented a simple
MinHeap
class and used it to maintain at mostmaxNrAssignments
values in the heap, popping smaller values as necessary.This successfully addresses the memory issue, and on profiling with gperftools, the heap profile drops from 25 GB to less than 1 MB. Consequently, I have been able to run the hybrid estimator to over 120 timesteps, up from a limit of 28.
The estimator is still struggling once we go beyond 150 due to the large number of discrete variables at that point. This would seem like a good candidate for some sort of marginalization scheme, since the initial 50 discrete variables shouldn't be affected much.