-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLA+ Trace validation #113
Conversation
148082d
to
02e685a
Compare
4ab49b4
to
a66c3e6
Compare
Out of curiosity, have you found any interesting bugs in the etcd/raft with this test? |
No. I did not find bug so far. |
7f95a88
to
1e9254f
Compare
Rebase to etcd-io/raft main |
1e9254f
to
5568437
Compare
@joshuazh-x I'll give it a closer look this week. |
614c0dd
to
166e002
Compare
@pav-kv any comment? |
ping @pav-kv |
From discussion on slack https://kubernetes.slack.com/archives/C3HD8ARJ5/p1711367871600289 @ahrtr asked if we can confirm that TLA+ can reproduce the etcd durability issue in etcd-io/etcd#14370. @joshuazh-x amazingly already provided a repro joshuazh-x@6aefcc8. I think this already confirms how having at TLA+ model could provide benefits to etcd. We could continue to iterate on the PR, but I don't think it's needed as it already will bring value and should not impact the raft codebase. There are many improvements that we could implement:
However I don't think we need everything at once, I can work with @joshuazh-x to iterate on it. |
One thing, I would like to ensure that I can follow the instructions provided in README and run the validation using script. Left a comment. |
Cannot run ./validate.sh, please provide instruction on how to get a trace.
@@ -761,6 +771,8 @@ func (r *raft) appliedSnap(snap *pb.Snapshot) { | |||
// index changed (in which case the caller should call r.bcastAppend). This can | |||
// only be called in StateLeader. | |||
func (r *raft) maybeCommit() bool { | |||
defer traceCommit(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it's maybe
Commit. Which means that it may not commit. But you always trace the commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit trace here will make state machine step with AdvanceCommitIndex
action which contains same logic as that in maybeCommit
. We expect state machine has states as that after calling maybeCommit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit trace here will make state machine step with
AdvanceCommitIndex
action which contains same logic as that inmaybeCommit
. We expect state machine has states as that after callingmaybeCommit
.
- I do not see the "same logic".
- Do you mean that you always need to record a
rsmCommit
event in the trace even if raft doesn't really commit although it called(*raft) maybeCommit()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rsmCommit tells trace validator to try advancing commit index as that in action AdvanceCommitIndex in the spec. As you can see, newCommitIndex
will has same value as current commitIndex
if no uncommitted entries have quorum acks in current term. This is of same behavior as maybeCommit
.
AdvanceCommitIndex(i) ==
/\ state[i] = Leader
/\ LET \* The set of servers that agree up through index.
Agree(index) == {k \in GetConfig(i) : matchIndex[i][k] >= index}
logSize == Len(log[i])
\* logSize == MaxLogLength
\* The maximum indexes for which a quorum agrees
agreeIndexes == {index \in 1..logSize :
Agree(index) \in Quorum(GetConfig(i))}
\* New value for commitIndex'[i]
newCommitIndex ==
IF /\ agreeIndexes /= {}
/\ log[i][Max(agreeIndexes)].term = currentTerm[i]
THEN
Max(agreeIndexes)
ELSE
commitIndex[i]
IN
/\ CommitTo(i, newCommitIndex)
/\ UNCHANGED <<messageVars, serverVars, candidateVars, leaderVars, log, configVars, durableState>>
The example trace isn't correct?
|
Quoting from the issue comment @pav-kv - #111 (comment)
I believe that the code changes can be completely decoupled from the raft codebase. @joshuazh-x Did you explore other ways of obtaining the trace without changing the code. We explored a very similar approach where we sample traces and simulate the TLA model using the traces. We go further to use the coverage on the model to fuzz test the raft library. You can find our implementation here - https://github.com/zeu5/raft-fuzzing/blob/master/raft.go There has been some contrasting work done where the traces are generated from the model and run on the implementation. This requires a bit more involved instrumentation which controls the running of the go code in a step-by-step fashion - https://dl.acm.org/doi/abs/10.1145/3552326.3587442. More crucially, they found certain issues with the original TLA spec used here. Unfortunately I observe that the same bugs exist in the TLA+ model here. |
I see your comment in the sig-etcd channel. I think the benefit of instrumenting/changing raft code is that it can be more easily reused by all applications & test systems. |
To summarise my discussion in slack with @ahrtr - The instrumentation (light and safe) will be a feature of the raft library which users can use to collect traces and run the model checker. It is also useful in the etcd e2e tests. However, there is one main concern with the mechanism in which the traces should be collected. If traces are collected per node and used to check the correctness of the states observed in that node, the interleaving of messages is lost. With this we risk not identifying large class of bugs where there is an inconsistency between the states of two or more nodes. For example, no two leaders in the same term. Therefore any trace collection mechanism should be central (to all nodes) and preserve the exact interleaving of events observed. The example trace suggests that there is an interleaving of events, whether the interleaving is exactly as it occurred in the execution is unknown. |
d31be33
to
64ced68
Compare
@serathius @ahrtr Sorry I was swamped at something else lately. I updated the PR following my last catch-up with @ahrtr. Hopefully I did not miss anything important. |
@ahrtr @lemmy has some great idea on solving this issue. Basically we will need to validate this in non-deterministic way. I'll submit another PR after this one is in. |
@serathius @ahrtr #192 is the draft PR I mentioned to generate traces for test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really useful, thanks for your work!
I had a quick question to make sure I'm doing this right:
I tried running verify-model.sh
and I get:
Starting... (2024-04-07 14:49:41)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-04-07 14:49:41.
Model checking completed. No error has been found.
Estimates of the probability that TLC did not check all reachable states
because two distinct states had the same fingerprint:
calculated (optimistic): val = 3.3E-19
7 states generated, 1 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 1.
The average outdegree of the complete state graph is 0 (minimum is 0, the maximum 0 and the 95th percentile is 0).
I see only one state and a graph depth of 1, is that right? Or could that be because of not running it long enough? And also, how can I run this for longer?
That doesn't seem right. The output shall be something like this:
Could you share the full output? |
@joshuazh-x thanks, Full output here
|
64ced68
to
f16e02a
Compare
I forgot to set its initial configuration. Fix it in the new commit. BTW MCetcdraft.tla/cfg is recommended for model checking as it sets reasonable boundaries to limit the search in a small scope. And it also includes bootstrapping logic in node.go. |
@@ -761,6 +771,8 @@ func (r *raft) appliedSnap(snap *pb.Snapshot) { | |||
// index changed (in which case the caller should call r.bcastAppend). This can | |||
// only be called in StateLeader. | |||
func (r *raft) maybeCommit() bool { | |||
defer traceCommit(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit trace here will make state machine step with
AdvanceCommitIndex
action which contains same logic as that inmaybeCommit
. We expect state machine has states as that after callingmaybeCommit
.
- I do not see the "same logic".
- Do you mean that you always need to record a
rsmCommit
event in the trace even if raft doesn't really commit although it called(*raft) maybeCommit()
?
6fde2d7
to
12b5a35
Compare
Overall looks good to me. I am Ok to merge this PR once other maintainers have no objection. After we merge this PR, I suggest to move all the TLA+ specs into a dedicated separate repo, something like |
LGTM
Please note that directories can also have dedicated maintainers, let's keep the TLA spec in raft for as long as we are iterating on the test to avoid friction of coordinating between two repos. We can decide later. |
Signed-off-by: Joshua Zhang <[email protected]>
12b5a35
to
195edd6
Compare
OK, we can revisit this later. Hopefully eventually the TLA spec maintainer should have permission to merge the change on the spec independently. |
LGTM Thanks for the great work! |
So happy to see this merged, thank you @joshuazh-x for your work! |
This is a great demonstration of TLA+ "real world usage". I hope it gets more publicized and mentioned. |
Per our understanding, to guarantee the correctness of a service based on this raft library, we need to at least ensure two things: the correctness of the consensus algorithm, and correctness of implementation.
We know that using TLC model-check tool and we can verify the correctness of algorithm by exploring the state space of the finite state machine model. You can refer to #112 for the spec that aligns with existing raft library implementation. Note that the spec is also included in this PR.
This PR proposes a method to address the second part of the question: how to verify implementation aligns with algorithm. This PR leverage the idea from TLA+ trace validation in Microsoft CCF. The basic idea is to constrain the state space by the states and transitions that the service actually walks through. We add special trace log points in the library to record the algorithm relevant states and transitions. A trace validation spec walks the core algorithm state machine following the transitions specified by these traces. If it encounters any trace whose state or transition are not allowed by the core algorithm state machine, a mismatch between the implementation and algorithm is found.
Such mismatch implies two things: implementation issue, or out-of-dated model. If we can guarantee the correctness of model, then there must be something wrong in the implementation. We believe this would be especially useful to avoid regressions in new feature implementation.