Zeus is a datastore that offers fast locality-aware distributed transactions with strong consistency and availability. A brief description follows and more details can be found in the Eurosys'21 paper.
This is the publicly available artifact repository supporting Zeus, which contains the specification of the two protocols that enable Zeus locality-aware reliable transactions; the ownership protocol and the reliable commit protocol of Zeus. The specifications are written in TLA+ and can be used to verify Zeus's correctness via model-checking.
Zeus protocols build on ideas of Hermes and draws inspiration from cache coherence and hardware transactional memory exapting ideas to a replicated distributed setting for availability. Inspired concepts include the invalidation-based design of both proposed protocols and Zeus's approach to move objects and ensure exclusive write access (ownership) to the coordinator of a write transaction.
@inproceedings{Katsarakis:21,
author = {Katsarakis, Antonios and Ma, Yijun and Tan, Zhaowei and Bainbridge, Andrew and Balkwill, Matthew and Dragojevic, Aleksandar and Grot, Boris and Radunovic, Bozidar and Zhang, Yongguang},
title = {Zeus: Locality-Aware Distributed Transactions},
year = {2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the Sixteenth European Conference on Computer Systems},
location = {Online Event, United Kingdom},
series = {EuroSys '21}
}
Transactions in Zeus involve three main phases:
- Prepare & Execute: Execute the transaction locally;
If locality is not captured (i.e., if accessing an object not local to the executor -- or missing exclussive write access for write transactions)
→ the object (and/or permissions) are acquired via the ownership protocol- Exclusive owner guarantee: at any time, at most one node with exclusive write access (i.e., owner) to an object
- Fast/slow-path design: to acquire ownership (and data) in at most 1.5 RTT regardless of the requesting node in the absence of faults
- Fault-tolerant: each ownership protocol step is idempotent to recover from faults
- Local Commit: Any traditional single node (unreliable -- i.e., non-replicated) commit
- Reliable Commit: Replicate updates to sharers for data availability:
- Fast Commit: 1RTT that is also pipelined to hide the latency
- Read-only optimized transactions: strictly serializable and local from any replica
- Fault-tolerant: each reliable commit step is idempotent to recover from faults
Faults: The specification and model checking assumes that crash-stop node faults and message reorderings may occur.
Message losses in Zeus are handled via retransmissions. The exact failure model can be found in the paper.
Strong Consistency: Zeus transactions guarantee the strongest consistency (i.e., are strictly serializable).
Invariants: A list of model-checked invariants provided by the protocols follows
- Amongst concurrent ownership requests to the same object, at most one succeeds.
- At any time, there is at most one valid owner of an object.
- A valid owner of an object has the most up-to-date data and version among live replicas.
- All valid sharer vectors (stored by directory nodes and the owner) of an object agree on the object's sharers and ownership timestamp (o_ts).
- The owner and readers are always correctly reflected by all valid sharer vectors.
- A replica found in the valid state stores the latest committed value of an object.
To model check the protocols, you need to download and install the TLA+ Toolbox so that you can run the TLC model checker using either the Reliable commit or ownership TLA+ specifications. We next list the steps to model check Zeus's reliable commit protocol (model checking the ownership protocol is similar).
- Prerequisites: Any OS with Java 1.8 or later, to accommodate the TLA+ Toolbox.
- Download and install the TLA+ Toolbox.
- Launch the TLA+ Toolbox.
- Create a spec: File>Open Spec>Add New Spec...; Browse and use zeus/reliable_commit_protocol/ZeusReliableCommit.tla as root module to finish.
- Create a new Model: Navigate to TLC Model Checker>New model...; and create a model with the name "reliable-commit-model".
- Setup Behavior: In Model Overview tab of the model, and under the "What is the behavior spec?" section, select "Temporal formula" and write "Spec".
- Setup Constants: Then specify the values of declared constants (under "What is the model?" section). You may use low values for constants to check correctness without exploding the state space. An example configuration would be three nodes and maximum versions of two or three. To accomplish that, you would need to click on each constant and select the "ordinary assignment" option. Then fill the box for version and epoch constants (e.g., R_MAX_VERSION) with a small number (e.g., with "2" or "3") and for any node related fields (e.g., R_NODES) with a set of nodes (e.g., "{1,2,3}" -- for three nodes).
- The reliable commit specification is a single TLA+ module in zeus/reliable_commit_protocol folder.
- The ownership specification is decoupled into three modules under the zeus/ownership_protocol folder for simplicity. ZeusOwnership.tla and ZeusOwnershipMeta.tla specify (and can be used to model check) the ownership protocol in the absence of faults. The specification with failures is built on top of those in the module ZeusOwnershipFaults.tla.
- The reliable commit specification does not include the pipelining optimization yet, and the ownership specification focuses on the slow-path for now -- which is mandatory to model check faults.
- Apart from acquiring ownership, the ownership protocol can be utilized to handle other dynamic sharding actions (e.g., remove or add a reader replica) which were omitted from the paper. We may describe those in a separate online document if there is interest.
This work is freely distributed under the Apache 2.0 License.
Antonios Katsarakis: antonis.io
| [email protected]