-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Various Kinds of Consistent Hash #17817
Conversation
Hello! I was wondering two questions about this algorithm:
|
@lucyge2022 You are right. Although jump consistent hash takes less time to calculate the result and data on different workers will be more balanced, the two points you mentioned are its fatal drawbacks. |
2fec0ec
to
b597c61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
alluxio-bot, merge this please |
What changes are proposed in this pull request?
Add Ketama Hashing, Jump Consistent Hashing, Maglev Hashing, and Multi Probe Hashing.
Why are the changes needed?
Now alluxio's user worker selection policy is Consistent Hash Policy. It bings too much time cost, and it is not enough uniform, and not strictly consistent.
Ketama: https://github.com/RJ/ketama
Jump Consistent Hashing: https://arxiv.org/pdf/1406.2294.pdf
Maglev Hashing: https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/44824.pdf
Multi Probe Hasing: https://arxiv.org/pdf/1505.00062.pdf
We strongly recommend using Maglev Hashing for User Worker Selection Policy. Under most situation, it has the minimum time cost, and it is the most uniform and balanced hashing policy.
Does this PR introduce any user facing changes?
alluxio.user.worker.selection.policy
has the following values:CONSISTENT
,JUMP
,KETAMA
,MAGLEV
,MULTI_PROBE
,LOCAL
,REMOTE_ONLY
, corresponding to consistent hash policy, maglev hash policy, ketama hash policy, maglev hash policy, multi-probe respectively hash policy, local worker policy, remote only policy.The current default value is
CONSISTENT
.We recommend using Maglev Hash, which has the best hash consistency and is the least time-consuming. That is to say, set the value of
alluxio.user.worker.selection.policy
toMAGLEV
. We will also consider setting this as the default value in the future.Ketama Hasing
alluxio.user.ketama.hash.replicas
: This is the value of replicas in the ketama hashing algorithm. When workers changes, it will guarantee the hash table is changed only in a minimal. The value of replicas should be X times the physical nodes in the cluster, where X is a balance between efficiency and cost.Jump Consistent Hashing
None.
Maglev Hashing
alluxio.user.maglev.hash.lookup.size
: This is the size of the lookup table in the maglev hashing algorithm. It must be a prime number. In the maglev hashing, it will generate a lookup table for workers. The bigger the size of the lookup table, the smaller the variance of this hashing algorithm will be. But bigger look up table will consume more time and memory.Multi Probe Hashing
alluxio.user.multi.probe.hash.probe.num
: This is the number of probes in the multi-probe hashing algorithm. In the multi-probe hashing algorithm, the bigger the number of probes, the smaller the variance of this hashing algorithm will be. But more probes will consume more time and memory.