How to reach consensus fast and in a lazy way #370
cason
started this conversation in
Specifications
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This discussion should serve as basis for the node synchronization protocol designed, tracked in #260.
Assume that we have a node that is not a validator (at the considered heights or at all), or it is slow or at some extent lagging behind in the protocol. What is the best approach to progress as fast as possible, by committing valid blocks, without misbehaving but being selfish and careless, namely, not performing actions that it is not forced to while still guaranteeing progress.
Commits or Precommit certificates
The first priority of this node is to receive or look after in its message log for a
Commit
, namely, a set with2f + 1
voting-power equivalentPrecommit
messages for the same height, round, and value. Once aCommit
for a height is found and validated, the node does not need to consider any other vote message for that same height it receives or has in its message log.The only information that the node needs for a height for which it has a
Commit
is the committed value. I assume here that the propagation of votes and of values is performed in an independent way. Moreover, I assume that theCommit
, or more precisely, one of thePrecommit
s it contains, has enough information to enable the node to find out the corresponding full value.Notice that if the node is a validator and finds a
Commit
for heightH
, its participation on heightH
was not needed. So, in the rationale of minimum effort, a node could just wait for enoughPrecommit
messages to cheaply decide a height.Polkas or Prevote certificates
If our lazy node does not see, after a while, a
Commit
for a height where it is one of the validators, this is an important indication that no enoughPrecommit
messages were issued, and potentially our lazy node has to issue aPrecommit
to allow the system to progress. This is particularly true whenPrecommit
messages for that height and a round are available, produced by at least one correct node, i.e., when at leastf + 1
voting-power equivalent matchingPrecommit
messages are available.In order to issue a
Precommit
for a value in that height and round, the node has to see what is called aPolka
, i.e., a set with2f + 1
voting-power equivalentPrevote
messages for the same height, round, and value. This is therefore the second priority of a lazy node: when there is noCommit
in sight, it must look for aPolka
, possibly the most recent (highest round) one.Once it has a
Polka
for a height, round, and value, the node needs to retrieve the associated full value. The same considerations as for aCommit
are valid here. Once aPolka
and the corresponding full value are retrieved, the validator issues itsPrecommit
. From this point, and assuming other correct nodes, the node should eventually see aCommit
and decide the height.Proposal
If our lazy node does not see, after a while, a
Polka
for a round and height where it is one of the validators, this is an important indication that no enoughPrevote
messages were issued, and potentially our lazy node has to issue aPrevote
to allow the system to progress. This is particularly true whenPrevote
messages for that height and a round are available, produced by at least one correct node, i.e., when at leastf + 1
voting-power equivalent matchingPrevote
messages are available.In order to issue a
Prevote
in a round of consensus, the node needs to retrieve and validate the value proposed in that round. In the pseudo-code, this message is aProposal
, while implementations probably make a distinction between aProposal
as a consensus message and the actual proposed valuev
, propagated by the proposer. This is therefore the third priority of a lazy node: when there is noCommit
orPolka
in sight, it must look for aProposal
, possibly the most recent (highest round) one.When a node is in this situation, it has to perform all the work expected from a validator in a round of consensus. Which means that it is probably not slow or lagging behind, but caught-up and responsible for carrying on the current round of consensus. It is harder to be lazy in this case without acting maliciously. So the validator needs to receive and validate the
Proposal
, issue aPrevote
, then issue aPrecommit
, to finally decide the height. To then start over from the next height.Summary
So, why is all this for?
A node that is lagging behind should focus on the pre-processing messages (from a future height or round) that are most important for achieving consensus with the minimal effort. This enables reducing substantially the backlog of pending messages that is one of the reasons for nodes lagging behind to never really catch-up with the majority of validators, in some cases.
Also, the lack of the required information for lazily progress in consensus should trigger the synchronization protocol to retrieve that information, following the priority and order above sketched. Because in the absence of failures, the only reason for a node not receiving this kind of information is that its backlog is so huge that it loses some information that is received by the majority of the nodes. So identifying and modeling this lazy approach should help the design of the anti-entropy/synchronization protocols.
This of course is a draft, an initial effort, and all feedback is welcome.
Beta Was this translation helpful? Give feedback.
All reactions