Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory consumption during discrete pruning #1898

Merged
merged 4 commits into from
Nov 7, 2024

Conversation

varunagrawal
Copy link
Collaborator

When running pruning, especially when we have a large number of discrete variables, the code would try to collect all the values in a std::vector. In the case we had >20 discrete variables with cardinality 2 each, this would lead to vectors of size greater than a million, thus taking up a large amount of memory.
Once the allocated heap memory exceeded a limit, the OS would automatically kill the process.

To address this, I implemented a simple MinHeap class and used it to maintain at most maxNrAssignments values in the heap, popping smaller values as necessary.

This successfully addresses the memory issue, and on profiling with gperftools, the heap profile drops from 25 GB to less than 1 MB. Consequently, I have been able to run the hybrid estimator to over 120 timesteps, up from a limit of 28.

The estimator is still struggling once we go beyond 150 due to the large number of discrete variables at that point. This would seem like a good candidate for some sort of marginalization scheme, since the initial 50 discrete variables shouldn't be affected much.

@varunagrawal varunagrawal self-assigned this Nov 6, 2024
Copy link
Member

@dellaert dellaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obvious question: why not use a built-in heap data structure from the standard library? Is there not one that satisfies our needs?

@varunagrawal
Copy link
Collaborator Author

varunagrawal commented Nov 6, 2024

Obvious question: why not use a built-in heap data structure from the standard library? Is there not one that satisfies our needs?

I tried looking for one as my first plan of attack, but some recommend using the std::priority_queue, while others recommend using the heap functions around a container.
Since I only ever keep the top N values, I found the later to be efficient and useful, especially since the std::priority_queue also uses a std::vector as the container under the hood.

Additionally, I was able to customize the API to use the std::vector with heap functions (allowing for convenient printing, for example) but mimic a simple API similar to the priority_queue. Win-win.

Copy link
Member

@dellaert dellaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@varunagrawal varunagrawal merged commit 04768a7 into develop Nov 7, 2024
33 checks passed
@varunagrawal varunagrawal deleted the improve-memory branch November 7, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants