Skip to content

Commit

Permalink
ci skip: Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
flowerinthenight authored Sep 28, 2024
1 parent fda0520 commit 88a0968
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@

### On payload size

One of `zgroup`'s main goal is to be able to track clusters with sizes that can change dynamically overtime (e.g. [Kubernetes Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [GCP Instance Groups](https://cloud.google.com/compute/docs/instance-groups), [AWS Autoscaling Groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html), etc.) with minimal dependencies and network load. My previous related works usually rely on some external service (see [spindle](https://github.com/flowerinthenight/spindle), [hedge](https://github.com/flowerinthenight/hedge)), using traditional heartbeating, to achieve this. This heartbeating technique usually suffers from increasing payload sizes (proportional with cluster sizes) as clusters get bigger. But I wanted a system that doesn't suffer from that side effect. Enter [SWIM](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf)'s infection-style information dissemination.
One of zgroup's main goal is to be able to track clusters with sizes that can change dynamically overtime (e.g. [Kubernetes Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [GCP Instance Groups](https://cloud.google.com/compute/docs/instance-groups), [AWS Autoscaling Groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html), etc.) with minimal dependencies and network load. My previous related works usually rely on some external service (see [spindle](https://github.com/flowerinthenight/spindle), [hedge](https://github.com/flowerinthenight/hedge)), using traditional heartbeating, to achieve this. This heartbeating technique usually suffers from increasing payload sizes (proportional with cluster sizes) as clusters get bigger. But I wanted a system that doesn't suffer from that side effect. Enter [SWIM](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf)'s infection-style information dissemination.

It can use a constant payload size regardless of the cluster size. SWIM uses a combination of `PING`s, `INDIRECT-PING`s, and `ACK`s to detect member failures while piggybacking on these same messages to propagate membership updates. At the moment, `zgroup` only uses SWIM's direct probing protocol; it doesn't fully implement the Suspicion sub-protocol (yet).
It can use a constant payload size regardless of the cluster size. SWIM uses a combination of `PING`s, `INDIRECT-PING`s, and `ACK`s to detect member failures while piggybacking on these same messages to propagate membership updates. At the moment, zgroup only uses SWIM's direct probing protocol; it doesn't fully implement the Suspicion sub-protocol (yet).

### On leader election

I also wanted some sort of leader election capability without depending on an external lock service. At the moment, `zgroup` uses [Raft](https://raft.github.io/raft.pdf)'s election algorithm subprotocol (without the log management) to achieve this. I should note that Raft's leader election algorithm rely on stable membership for it work properly, so `zgroup`'s leader election is a best-effort basis only; split-brain can still happen while the cluster size is changing. Additional code guards are added to minimize split-brain scenarios but it's not completely eliminated. In my use-case (and testing), gradual cluster size changes are mostly stable, while huge size deltas are not. For example, a big, sudden jump from three nodes (`zgroup`'s minimum size) to, say, a hundred, due to autoscaling, would cause split-brain.
I also wanted some sort of leader election capability without depending on an external lock service. At the moment, `zgroup` uses [Raft](https://raft.github.io/raft.pdf)'s election algorithm subprotocol (without the log management) to achieve this. I should note that Raft's leader election algorithm rely on stable membership for it work properly, so `zgroup`'s leader election is a best-effort basis only; split-brain can still happen while the cluster size is changing. Additional code guards are added to minimize split-brain scenarios but it's not completely eliminated. In my use-case (and testing), gradual cluster size changes are mostly stable, while huge size deltas are not. For example, a big, sudden jump from three nodes (zgroup's minimum size) to, say, a hundred, due to autoscaling, would cause split-brain.

### Join address

Expand Down

0 comments on commit 88a0968

Please sign in to comment.