Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Latest commit

 

History

History
164 lines (143 loc) · 15.4 KB

kid0010.md

File metadata and controls

164 lines (143 loc) · 15.4 KB

KID0010 - Recovery/Consensus Algorithm (KAACE)

Navigation

Back to table of contents

Link Commentary Section
0000 X Glossary, overview, how to use
0001 X Prefixes, Derivation and derivation reference tables
0002 X Data model (field & event concepts and semantics)
0003 X Serialization
0004 X Key Configuration (Signing threshold & key set)
0005 X Next Key Commitment (Pre-Rotation)
0006 X Seals
0007 X Delegation (pending PR by Sam)
0008 X Key-Event State Machine
0009 X Indirect Mode & Witnesses
0010 Recovery/consensus Algorithm (KAACE)
0011 Database & Storage Considerations
0097 n/a Non-Normative Implementation Guidance
0098 n/a Use Cases
0099 n/a Test Vectors and Normative Statement Index

Introduction

Keri’s agreement algorithm for consensus control establishment (KA2CE) or the algorithm, is run by the controller of an identifier in concert with a set of N witnesses designated by the controller to provide as a service the key event history of that identifier via a KERL (Key Event Receipt Log) in a highly available and fault tolerant manner. One motivation for using key event logs is that the operation of redundant immutable (deletion proof) event logs may be parallelizable and hence highly scalable. A KERL is an immutable event log that is made deletion proof by virtue of it being provided by the set of witnesses of which only a subset of F witnesses may at any time be faulty. In addition to designating the witness set, the controller also designates a threshold number, M, of witnesses for accountability. To clarify, the controller accepts accountability for an event when any subset M of the N witnesses confirm that event. The threshold M indicates the minimum number of confirming witnesses the controller deems sufficient given some number F of potentially faulty witnesses. The objective of the service is to provide a verifiable KERL to any validator on demand. Unlike direct mode where a validator may be viewed as an implicit witness, with indirect mode, a validator may not be one of the N explicitly designated witnesses that provide the service.

Witness Designation

The controller designates both the witness tally number and the initial set of witnesses in the inception event configuration. The purpose of the tally is to provide a threshold of accountability for the number of witnesses confirming an event

Subsequent rotation operations may amend the set of witnesses and change the tally number. This enables the controller to replace faulty witnesses and/or change the threshold of accountability of the witness set. When a rotation amends the witnesses it includes the new tally, the set of pruned (removed) witnesses and the set of newly grafted (added) witnesses.

Witnessing Policy

In this approach, the controller of a given identifier creates and disseminates associated key event messages to the set of N witnesses. Each witness, verifies the signatures, content, and consistency of each key event it receives. When a verified key event is also the first version of that event the witness has received then it witnesses that event by signing the event message to create a receipt, storing the receipt in its log (KERL), and returning the receipt as an acknowledgement to the controller. Depending on its dissemination policy a witness may also send its receipt to other witnesses. This might be with a broadcast or gossip protocol or not at all.

Recall as previously defined, the location of an event in its key event sequence is determined by its previous event digest and sequence number. Likewise as previously defined, The version of an event of the same class at given location in the key event sequence is different or inconsistent with some other event of the same class the same location if any of its other content differs or is inconsistent with that other event of the same class and location. To clarify, the event version includes the class of event, establishment or non-establishment. The special case where class matters is that an establishment event (such as rotation) at the same location may supersede a non-establishment event event (such as interaction) in the event of exploit of the non-establishment event. This enables recovery of live exploit of the exposed current set of authoritative keys used to sign non-establishment events via a rotation establishment event to the unexposed next set of authoritative keys. The specific details of this recovery are explained later (see Section 11.6).

In general, the witnessing policy is that the first seen version of an event always wins, that is, the first verified version is witnessed (signed, stored, acknowledged and maybe disseminated) and all other versions are discarded. The exception to this general rule is that an establishment event may recover following a set of exploited non-establishment events. The recovery process may fork off a branch from the recovered trunk. This disputed branch has the disputed exploited events, and the main trunk has the recovered events. The operational mode (see Section 10.), and the threshold of accountable duplicity determine which events in the disputed branch are accountable to the controller (see Section 11.6). Later messages or receipts from other witnesses may not change any existing entry in the log (the log is append only i.e. immutable). Each witness also adds to its log any verified signatures from consistent receipts it receives from other witnesses. A consistent receipt is a receipt for the same version of the event already in its log. Excepting recovery, inconsistent receipts i.e. for different event versions at the same location are discarded (not kept in the log). Although as an option, a controller may choose to run a juror (in concert with a witness) that keeps a duplicitous event log (DEL) of the inconsistent or duplicitous receipts that a witness receives. To clarify, a witness’ key event receipt log (KERL) is by construction an immutable log. This log includes the events with attached verified signatures, that are the receipts from the controller, the witness itself, and other witnesses.

Initial dissemination of receipts to the N witnesses by the controller may be implemented extremely efficiently with respect to network bandwidth using a round-robin protocol of exchanges between the controller and each of the witnesses in turn. Each time the controller connects to a witness to send new events and collect the new event receipts, it also sends the receipts it has received so far from other witnesses. This round-robin protocol may require the controller to perform at most two passes through the entire set of witnesses in order to fully disseminate a receipt from each witness to every other witness for a given event,This means that at most 2·N acknowledged exchanges are needed for each event in order to create a fully witnessed key event receipt log (KERL) at each and every witness and the controller. Network load therefore scales linearly with the number of witnesses. When network bandwidth is less constrained then a gossip protocol might provide full dissemination with lower latency than a round robin protocol. Gossip protocols are a relatively efficient mechanism and scale with N · log(N) (where N is the number of witnesses) instead of 2·N. A directed acyclic graph or other data structure can be used to determine what needs to be gossiped.

Event Escrow

KERI is designed to be highly protocol agnostic, that is, the transport protocol a controller chooses to send events to its witnesses may be implementation dependent. Some of these protocols may benefit, usually for performance reasons, from the witnesses holding events in escrow. There are two primary use cases: out-of-order events and multi-signature events. Event escrow is an optional implementation specific configurable capability of controllers and witnesses implementations. A witness may employ neither, one or both of an out-of-order event escrow cache and an in-order partial multi-signature event escrow cache. Out-of-order events may be first held in on out-of-order cache before being processed by an in-order but partial multi-signature cache.

Out-Of-Order

When the transport protocol is asynchronous such as UTP, then the events may arrive at a witness out of order. An out of order event is not fully verifiable because without a copy of the immediately preceding event in the sequence, the prior event digest is not verifiable against the prior event. Moreover, for an out-of-order rotation (establishment) event, the newly rotated current set of keys may not be verifiable against the next keys digest from most recent prior establishment event when that prior establishment event has not yet been received and entered into the KEL. Should a witness drop an otherwise verifiable but out of order event, then that event must eventually be retransmitted. This adds network load and latency. Holding otherwise verifiable but out-of-order events in a tunable escrow cache until the missing intervening events show up may be a good way of optimizing network bandwidth and latency in asynchronous protocols. An otherwise verifiable event means that what can be verified does verify. If what can be verified does not verify then there is no reason to hold the event in escrow.

An escrow cache of unverified out-of-order event provides an opportunity for malicious attackers to send forged event that may fill up the cache as a type of denial of service attack. For this reason escrow caches are typically FIFO (first-in-first-out) where older events are flushed to make room for newer events.

Multi-Signature

When the identifier prefix established by the KEL uses a multi-signature scheme with a set of authoritative key-pairs there may be a corresponding set of controllers, where each controller holds one or more of the key-pairs from the set. This may be called a multi-controller or collective controller case. This set of controllers may use a collection protocol amongst themselves where one of the controllers first collects enough signatures to reach the multi-signature threshold before sending the event with signatures to the witnesses. In addition to latency, this may make the designated collecting controller a single point of failure.

In this multi-controller case, an alternative approach to collecting signatures is for the witnesses to directly collect events from each of the controllers,, but where each event has only a partial signature set. An event with a partial signature set would not verify against the threshold requirement for the event but may otherwise be verifiable with the provided signatures. In this case the event could be held in escrow until matching events, but with different partial signature sets from other members of the set of controllers, provide enough signatures to reach the threshold. A multi-signature threshold event escrow capability may be more performant and reliable that a two stage protocol where the full set of signatures must first be collected before being sent to the witnesses.

In this case, an in-order but otherwise verifiable event may be held in a multi-signature event escrow cache if the set of attached signatures includes at least one verifiable signature and no unverifiable signatures. This minimizes the exposure of the partial multi-signature escrow cache to a denial of service attack using forged events.

Consensus Policy

The purpose of using N designated witnesses in the algorithm is to ensure both the availability and the security of the service in the event of faulty witnesses. In addition to faults such as unresponsiveness and network interruptions, allowed faults include what are commonly known as Byzantine faults such as malicious or duplicitous behavior (dishonesty) by the witnesses. A faulty controller is a special case to be discussed below. A dishonest witness may promulgate receipts for inconsistent versions of an event or neglect to store or disseminate receipts. An unresponsive but honest witness may be unavailable or otherwise unable to promulgate a receipt. A malicious witness may intentionally be unresponsive. The term Byzantine agreement (BA) is used to describe a type of algorithm that provides some guarantee of consensus agreement despite Byzantine faults, i.e. it’s Byzantine Fault Tolerant (BFT) [36]. Unlike more conventional BA algorithms there is no requirement in KA2CE that the consensus process provide a total ordering of events. That ordering is already provided to the algorithm as an input by the controller. The purpose of the consensus agreement process in KA2CE is to produce for each event sourced by the controller a verifiable agreement that includes the events and signatures (receipts) from a sufficient number of witnesses in a secure highly available manner. The algorithm merely produces verifiable confirmations of already ordered events. This greatly simplifies the algorithm.

Consensus agreement guarantees in conventional BA algorithms are often characterized with the terms correct, safe, and live [39]. These guarantees exist for algorithms that undertake multiple phases or rounds in coming to consensus. This complicates the algorithms because agreement at one phase may not propagate to the next phase. In the case of this algorithm, KA2 CE, the consensus process is different enough that there are no direct analogous definitions of correct, safe, and live albeit some of the related constraints in KA2CE are evocative of the constraints in conventional BA algorithms associated with those terms. In order to better understand the differences we first summarize the usual definitions of these terms before providing our terminology. Typically F is defined to be maximum number of faulty nodes at any time (any type of fault). Given at most F faults then any consensus agreement of F +1 nodes is consistent because at least one the F +1 nodes is honest. Therefore this agreement is called correct. But there is no guarantee that any set of F +1 nodes will ever be in agreement. A group of nodes that contribute to or come to consensus agreement is called a quorum. In this case F +1 is the quorum size for correct agreement. Furthermore suppose that N = 2F +1, where N is total number of nodes. Then a correct agreement is guaranteed because at least F +1 of the N nodes are honest and they also form a majority of all nodes so no other agreement is possible. Therefore this agreement is called safe. But there is no guarantee that the nodes in agreement (quorum) will know they came to agreement in a subsequent round because as many as F of the nodes in agreement in a given round may subsequently become unresponsive in the next round. In a multi-phase or multi-round algorithm, this means that despite being safe, the agreement is not live because the agreement may not propagate to the next round. Suppose instead that N = 3F +1 , then a correct agreement is guaranteed as before but the agreement will have 2F +1 nodes (quorum size) and despite some F of those 2F +1nodes becoming unresponsive, the remaining F +1 honest and responsive nodes will be able to propagate that correct safe agreement to the next round. Thus N = 3F +1 with a quorum size of 2F +1 guarantees a correct, safe, and live agreement across multiple rounds.