Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent hashing #19

Open
japananh opened this issue Oct 13, 2023 · 0 comments
Open

Consistent hashing #19

japananh opened this issue Oct 13, 2023 · 0 comments

Comments

@japananh
Copy link
Owner

japananh commented Oct 13, 2023

Consistent Hashing

Overview

Consistent hashing is a technique used in distributed computing and data storage systems, often relevant in the context of backend development.

without consistent hashing

Example

Imagine you have a distributed system with 4 nodes, and you want to distribute data items across these nodes using consistent hashing.

1. Node Identification

Screenshot 2023-10-14 at 00 12 05
  • Node S0 is assigned a hash identifier of S%360=0.
  • Node S90 is assigned a hash identifier of S%360=90.
  • Node S180 is assigned a hash identifier of S%360=180.
  • Node S270 is assigned a hash identifier of S%360=270.

2. Data Item Mapping

Data items are hashed using the same hash function, producing numerical values between 0 and N. We'll consider a few data items and their hash values:

  • Data Item 1 hashes to 1000. -> 1000%360=280. 280 is between S270 and S0 -> Data item 1 is mapped to the nearest node following it, which is Node S0 (hash 0)
  • Data Item 2 hashes to 1500. -> S90
  • Data Item 3 hashes to 2000. -> S270
  • Data Item 4 hashes to 3000. -> S120
  • Data Item 5 hashes to 4000. -> S90

Now, the data items are consistently distributed across the nodes.

3. Add/Remove Node

Screenshot 2023-10-14 at 10 52 21

If a new node, S50, with a hash identifier of 50, is added to the system, only Data Item 5 (hash 4000) needs to be remapped, and it will be assigned to Node S50. The rest of the data items remain with their current nodes. This minimizes data movement and keeps the distribution efficient.

Similarly, if a node goes offline, its data can be reassigned to the next available node on the ring, ensuring data availability and load balancing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant