- Amazon Dynamo Paper
- Scalable Distributed Transactions across Heterogeneous Stores
- Raft Consensus Algorithm
Events are organized in topics. In PolarStreams, topics are always multi-producer and multi-consumer. To achieve high availability and durability, topic events are persisted on disk on multiple PolarStreams brokers.
Data is automatically distributed across brokers using consistent hashing (Murmur3 tokens) in a similar way as Amazon DynamoDB and Apache Cassandra. Each broker is assigned a token based on the ordinal index within the cluster, which will be used to determine the data that naturally belongs to that broker.
According to the event partition key and the hashing algorithm, the event will be placed in a broker and replicated to the following two brokers in the cluster.
In the preceding diagram, Broker A is the natural leader of token A. When Broker A is considered unavailable by other brokers, after a series of strong consistent operations broker B will take ownership of range (A, B). In case both A and B are considered down, C will not try to take ownership of range (A, B), as it won't be able guarantee the minimum amount of replicas of the data.
PolarStreams uses a deterministic way to assign tokens to brokers (i.e. broker with ordinal 2 will always have the same token). New brokers added to an existing cluster will be placed in the middle of the previous token range, splitting it in half. In the same way, removing brokers causes ranges to be twice the size.
This technique provides a simple way to add/remove brokers without the need to rebalance existing data. New brokers can take ownership of their natural ranges when ready, with the help of previous brokers, without disrupting availability [additional info needed related to how the ownership decisions are taken / transactions for election of a token leader].
A Producer doesn't necessarily have to understand this placement scheme to publish an event. It can target any broker or the Kubernetes Service and the event message will be routed automatically. From the client's perspective producing a message is just calling an HTTP endpoint.
To consume events, a client should poll for new data to all live brokers. The brokers will determine when that consumer should be served with topic events of a given partition depending on the consumer placement.
PolarStreams guarantees that any consumer of a given topic and partition key will always read that events in the same order as they were written.
Internally, PolarStreams uses a series of I/O-related techniques that make it both fast and lightweight, supporting high throughput and consistently low latency while keeping compute and memory resource utilization low. Learn more about these techniques in the I/O Documentation.
PolarStreams uses a series of TCP connections to communicate between brokers for different purposes:
- Gossip: PolarStreams uses a protocol to exchange state information about brokers participating in the cluster, called Gossip. Each broker uses the Gossip protocol to agree on token range ownership and consumers view/topology with neighboring brokers.
- Data replication: All data is automatically replicated to the following brokers in the ring. Compressed groups of events of a certain partition are sent periodically to the followers of the partition leader.
- Producer routing: When a producer client sends a new event message to a broker that is not the leader of the partition, the broker will route the message to the correct leader in the foreground.