Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
We used to hash data rows, and assign to clusters of data rows an identity equal to the hash of all their children. We're now moving to a more compact representation, where data rows are given an integer key, and we want a unique way of mapping new clusters to new integer keys. This is made more tricky by the fact that we're parallelising the construction of hierarchical clusters.
Changes proposed in this pull request
Guidance to review
The fact that keys are always negative means that it's possible to build a hierarchy where keys are themselves parts of keyed sets, and it's easy to distinguish integers mapped to raw data points (which will be non-negative), to integers that are keys to sets (which will be negative). The salt allows to work with a parallel execution model, where each worker maintains their separate key space, as long as each worker operates on disjoint subsets of positive integers. The salt and a key are combined via the Cantor pairing function.
Checklist: