libgenders: rearchitect internal libgenders datastructures #84
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem: The internal libgenders data structures were designed
back when the majority of genders files stored only 1 node per
line. It builds up a number of lists, as well as hashes that point
to many attrs, vals, and other lists.
This design has shown itself to perform poorly on very large
clusters, such as those with 10K nodes or more.
Re-do the entire set of data structures internal to libgenders.
This re-architecture improves performance in the average/normal case, but
could perform worse in worst case scenarios. Most notably, it will be
on users to create "smart" genders files.
For example, hostranges should always be used in genders files. The
old legacy of 1 node per line should not be used.
Using nodeattr's --compress-hosts option should help as well.
Fixes #70