Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

models: adjust MoreThanFNodesCommitted check #2

Merged
merged 1 commit into from
Mar 14, 2023

Conversation

AnnaShaleva
Copy link

@AnnaShaleva AnnaShaleva commented Mar 6, 2023

Based on the neo-project/neo-modules#792 (comment), the definition of MoreThanFNodesCommitted should be adjusted to match the core algorithm as it's the key factor of the four-good-nodes liveness lock scenario. There following change is made:

  • We consider the node to be "Committed" if it has the Commit message sent at any view. If the good node has committed, then we know for sure that it won't go further to the next consensus round and can rely on this information.

The thing that remains the same is that we do not count the "lost" nodes, because this information can't be reliably trusted. See the comment inside the commit.

Based on this adjustment, the first liveness lock scenario mentioned in neo-project/neo-modules#792 (comment) ("Liveness lock with four non-faulty nodes") needs to include one more step to enter a deadlock: one of the replicas that in the "cv" state must die in the end of the scenario.

Moreover, there's another liveness lock scenario when one of the nodes is in the RMDead list, i.e. can "die" at any moment. Consider running the base model specification with the following configuration:

RM      	RMFault 	RMDead 	MaxView
{0, 1, 2, 3} 	{}      	{0} 	2

formal-models/dbft/dbft.tla Outdated Show resolved Hide resolved
formal-models/dbft/dbft.tla Show resolved Hide resolved
@AnnaShaleva AnnaShaleva force-pushed the adjust-more-than-f branch 3 times, most recently from 87b603f to ffc9679 Compare March 6, 2023 12:36
Based on the neo-project/neo-modules#792 (comment),
the definition of MoreThanFNodesCommitted should be adjusted to match
the core algorithm as it's the key factor of the four-good-nodes liveness lock
scenario. There following change is made:

* We consider the node to be "Committed" if it has the Commit message
  sent at _any_ view. If the good node has committed, then we know for sure
  that it won't go further to the next consensus round and can rely on
  this information.

The thing that remains the same is that we do not count the "lost" nodes, because
this information can't be reliably trusted. See the comment inside the commit.

Based on this adjustment, the first liveness lock scenario mentioned in
neo-project/neo-modules#792 (comment)
("Liveness lock with four non-faulty nodes") needs to include one more step
to enter a deadlock: one of the replicas that in the "cv" state must die in the
end of the scenario.

Moreover, there's another liveness lock scenario when one of the nodes is in
the RMDead list, i.e. can "die" at any moment. Consider running the base model
specification with the following configuration:
```
RM      	RMFault 	RMDead 	MaxView
{0, 1, 2, 3} 	{}      	{0} 	2
```
AnnaShaleva added a commit to AnnaShaleva/dbft that referenced this pull request Mar 6, 2023
Finally, the reason of the following real-life liveness lock in the four nodes
scenario is found:
```
rmState |-> ( 0 :> [type |-> "cv", view |-> 1] @@
  1 :> [type |-> "cv", view |-> 1] @@
  2 :> [type |-> "commitSent", view |-> 0] @@
  3 :> [type |-> "commitSent", view |-> 1] )
```
The issue was discovered and fixed in the TLA+ model initially, see the
roman-khimov#2.

The decision on preparation payloads rejection/acceptance should be based on
the number of Commits accepted durnig the whole set of rounds, not during
the current round only.
AnnaShaleva added a commit to AnnaShaleva/dbft that referenced this pull request Mar 6, 2023
Finally, the reason of the following real-life liveness lock in the four nodes
scenario is found:
```
rmState |-> ( 0 :> [type |-> "cv", view |-> 1] @@
  1 :> [type |-> "cv", view |-> 1] @@
  2 :> [type |-> "commitSent", view |-> 0] @@
  3 :> [type |-> "commitSent", view |-> 1] )
```
The issue was discovered and fixed in the TLA+ model initially, see the
roman-khimov#2.

The decision on preparation payloads rejection/acceptance should be based on
the number of Commits accepted durnig the whole set of rounds, not during
the current round only.

See also the C# source code, it doesn't have such problem:
https://github.com/neo-project/neo-modules/blob/d00d90b9c27b3d0c3c57e9ca1f560a09975df241/src/DBFTPlugin/Consensus/ConsensusContext.cs#L223
@roman-khimov roman-khimov merged commit c8d9ba6 into roman-khimov:master Mar 14, 2023
AnnaShaleva added a commit to AnnaShaleva/dbft that referenced this pull request Mar 14, 2023
Finally, the reason of the following real-life liveness lock in the four nodes
scenario is found:
```
rmState |-> ( 0 :> [type |-> "cv", view |-> 1] @@
  1 :> [type |-> "cv", view |-> 1] @@
  2 :> [type |-> "commitSent", view |-> 0] @@
  3 :> [type |-> "commitSent", view |-> 1] )
```
The issue was discovered and fixed in the TLA+ model initially, see the
roman-khimov#2.

The decision on preparation payloads rejection/acceptance should be based on
the number of Commits accepted durnig the whole set of rounds, not during
the current round only.

See also the C# source code, it doesn't have such problem:
https://github.com/neo-project/neo-modules/blob/d00d90b9c27b3d0c3c57e9ca1f560a09975df241/src/DBFTPlugin/Consensus/ConsensusContext.cs#L223
AnnaShaleva added a commit to AnnaShaleva/dbft that referenced this pull request Mar 15, 2023
Finally, the reason of the following real-life liveness lock in the four nodes
scenario is found:
```
rmState |-> ( 0 :> [type |-> "cv", view |-> 1] @@
  1 :> [type |-> "cv", view |-> 1] @@
  2 :> [type |-> "commitSent", view |-> 0] @@
  3 :> [type |-> "commitSent", view |-> 1] )
```
The issue was discovered and fixed in the TLA+ model initially, see the
roman-khimov#2.

The decision on preparation payloads rejection/acceptance should be based on
the number of Commits accepted durnig the whole set of rounds, not during
the current round only.

See also the C# source code, it doesn't have such problem:
https://github.com/neo-project/neo-modules/blob/d00d90b9c27b3d0c3c57e9ca1f560a09975df241/src/DBFTPlugin/Consensus/ConsensusContext.cs#L223
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants