How to reach consensus fast and in a lazy way #370

cason · 2024-09-05T17:35:13Z

cason
Sep 5, 2024
Collaborator

This discussion should serve as basis for the node synchronization protocol designed, tracked in #260.

Assume that we have a node that is not a validator (at the considered heights or at all), or it is slow or at some extent lagging behind in the protocol. What is the best approach to progress as fast as possible, by committing valid blocks, without misbehaving but being selfish and careless, namely, not performing actions that it is not forced to while still guaranteeing progress.

Commits or Precommit certificates

The first priority of this node is to receive or look after in its message log for a Commit, namely, a set with 2f + 1 voting-power equivalent Precommit messages for the same height, round, and value. Once a Commit for a height is found and validated, the node does not need to consider any other vote message for that same height it receives or has in its message log.

The only information that the node needs for a height for which it has a Commit is the committed value. I assume here that the propagation of votes and of values is performed in an independent way. Moreover, I assume that the Commit, or more precisely, one of the Precommits it contains, has enough information to enable the node to find out the corresponding full value.

Notice that if the node is a validator and finds a Commit for height H, its participation on height H was not needed. So, in the rationale of minimum effort, a node could just wait for enough Precommit messages to cheaply decide a height.

Polkas or Prevote certificates

If our lazy node does not see, after a while, a Commit for a height where it is one of the validators, this is an important indication that no enough Precommit messages were issued, and potentially our lazy node has to issue a Precommit to allow the system to progress. This is particularly true when Precommit messages for that height and a round are available, produced by at least one correct node, i.e., when at least f + 1 voting-power equivalent matchingPrecommit messages are available.

In order to issue a Precommit for a value in that height and round, the node has to see what is called a Polka, i.e., a set with 2f + 1 voting-power equivalent Prevote messages for the same height, round, and value. This is therefore the second priority of a lazy node: when there is no Commit in sight, it must look for a Polka, possibly the most recent (highest round) one.

Once it has a Polka for a height, round, and value, the node needs to retrieve the associated full value. The same considerations as for a Commit are valid here. Once a Polka and the corresponding full value are retrieved, the validator issues its Precommit. From this point, and assuming other correct nodes, the node should eventually see a Commit and decide the height.

Notice that in the absence of a Polka, which implies agreement on a valid value, but in the presence of 2f + 1 voting-power equivalent Prevote messages with nil or conflicting values, the validator can issue a Precommit for nil. The same applies when a Polka is seen but the corresponding full value is not received. This however is not a great option for a lazy validator and it should be its last resource, as it is not contributing for a decision in the current round.

Proposal

If our lazy node does not see, after a while, a Polka for a round and height where it is one of the validators, this is an important indication that no enough Prevote messages were issued, and potentially our lazy node has to issue a Prevote to allow the system to progress. This is particularly true when Prevote messages for that height and a round are available, produced by at least one correct node, i.e., when at least f + 1 voting-power equivalent matchingPrevote messages are available.

In order to issue a Prevote in a round of consensus, the node needs to retrieve and validate the value proposed in that round. In the pseudo-code, this message is a Proposal, while implementations probably make a distinction between a Proposal as a consensus message and the actual proposed value v, propagated by the proposer. This is therefore the third priority of a lazy node: when there is no Commit or Polka in sight, it must look for a Proposal, possibly the most recent (highest round) one.

When a node is in this situation, it has to perform all the work expected from a validator in a round of consensus. Which means that it is probably not slow or lagging behind, but caught-up and responsible for carrying on the current round of consensus. It is harder to be lazy in this case without acting maliciously. So the validator needs to receive and validate the Proposal, issue a Prevote, then issue a Precommit, to finally decide the height. To then start over from the next height.

Notice that in the absence of a Proposal or the associated full value the validator can Prevote for nil. This however is not a great option for a lazy validator and it should be its last resource, as it is not contributing for a decision in the current round.

Summary

So, why is all this for?

A node that is lagging behind should focus on the pre-processing messages (from a future height or round) that are most important for achieving consensus with the minimal effort. This enables reducing substantially the backlog of pending messages that is one of the reasons for nodes lagging behind to never really catch-up with the majority of validators, in some cases.

Also, the lack of the required information for lazily progress in consensus should trigger the synchronization protocol to retrieve that information, following the priority and order above sketched. Because in the absence of failures, the only reason for a node not receiving this kind of information is that its backlog is so huge that it loses some information that is received by the majority of the nodes. So identifying and modeling this lazy approach should help the design of the anti-entropy/synchronization protocols.

This of course is a draft, an initial effort, and all feedback is welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reach consensus fast and in a lazy way #370

{{title}}

Replies: 0 comments

Select a reply

How to reach consensus fast and in a lazy way #370

cason Sep 5, 2024 Collaborator

Commits or Precommit certificates

Polkas or Prevote certificates

Proposal

Summary

Replies: 0 comments

cason
Sep 5, 2024
Collaborator