From 9b351105cc19362a864533758e3bb51b84c61d06 Mon Sep 17 00:00:00 2001 From: flowerinthenight Date: Sat, 28 Sep 2024 13:27:09 +0900 Subject: [PATCH] ci skip: Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 83d9a49..a0e888d 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,8 @@ One of zgroup's main goal is to be able to track clusters with sizes that can change dynamically overtime (e.g. [Kubernetes Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [GCP Instance Groups](https://cloud.google.com/compute/docs/instance-groups), [AWS Autoscaling Groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html), etc.) with minimal dependencies and network load. My previous related works usually rely on some external service (see [spindle](https://github.com/flowerinthenight/spindle), [hedge](https://github.com/flowerinthenight/hedge)), using traditional heartbeating, to achieve this. This heartbeating technique usually suffers from increasing payload sizes (proportional to cluster sizes) as clusters get bigger. But I wanted a system that doesn't suffer from that side effect. Enter [SWIM](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf)'s infection-style information dissemination. It can use a constant payload size regardless of the cluster size. SWIM uses a combination of `PING`s, `INDIRECT-PING`s, and `ACK`s to detect member failures while piggybacking on these same messages to propagate membership updates (gossip protocol). At the moment, zgroup only uses SWIM's direct probing protocol; it doesn't fully implement the Suspicion sub-protocol (yet). +At the moment, zgroup uses a single, 64-byte payload message for all its messages, including leader election (see below). + ### On leader election I also wanted some sort of leader election capability without depending on an external lock service. At the moment, `zgroup` uses [Raft](https://raft.github.io/raft.pdf)'s election algorithm sub-protocol (without the log management) to achieve this. I should note that Raft's leader election algorithm depends on stable membership for it work properly, so zgroup's leader election is a best-effort basis only; split-brain can still happen while the cluster size is still changing. Additional code guards are added to minimize split-brain in these scenarios but it's not completely eliminated. In my use-case (and testing), gradual cluster size changes are mostly stable, while sudden changes with huge size deltas are not. For example, a big, sudden jump from three nodes (zgroup's minimum size) to, say, a hundred, due to autoscaling, would cause split-brain. Once the target size is achieved however, a single leader will always be elected.