-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP, DO NOT MERGE: Disable proposal forwarding for lease revoke requests #16285
base: main
Are you sure you want to change the base?
Conversation
dcbbb5c
to
f9a33a3
Compare
a0cc43b
to
137e94a
Compare
Signed-off-by: Hitoshi Mitake <[email protected]>
Signed-off-by: Hitoshi Mitake <[email protected]>
…l, will be removed later) Signed-off-by: Hitoshi Mitake <[email protected]>
137e94a
to
b58dfdb
Compare
DisableProposalForwardingCallback: func(m raftpb.Message) bool { | ||
for _, entry := range m.Entries { | ||
var raftReq etcdserverpb.InternalRaftRequest | ||
if !pbutil.MaybeUnmarshal(&raftReq, entry.Data) { | ||
continue | ||
} | ||
|
||
if raftReq.LeaseRevoke != nil { | ||
// If at least one of the entries has LeaseRevoke, the entire m will be discarded. | ||
// TODO: LeaseCheckPoint should be discarded neither? | ||
return true | ||
} | ||
} | ||
return false | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the stale leader has not yet stepped down to follower (completely partitioned), then the proposal will be rejected due to higher term state on new leader, correct?
If the stale leader (etcdserver) already stepped down to follower, why etcd server even bother starts to propose lease revoke and we stop the proposal forwarding on raft layer? We could have implemented this on etcd server layer, correct?
On the other hand, this will be a breaking change as LeaseRevoke can also be initiated from user on demand. Not to mention raft has to de-serialize each entry in the message which might have negative performance impact as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to
- have a fault injection e2e test in place to reproduce the issue first.
- supply different failure scenarios and define the expected behavior with this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know what I can help :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for checking this @chaochn47
If the stale leader has not yet stepped down to follower (completely partitioned), then the proposal will be rejected due to higher term state on new leader, correct?
Actually proposal (MsgProp) itself doesn't contain term information so it's not rejected by a new leader I think. I summarized this behavior in etcd-io/raft#88 (comment) and succeeding comments so it's great if you can cross check.
If the stale leader (etcdserver) already stepped down to follower, why etcd server even bother starts to propose lease revoke and we stop the proposal forwarding on raft layer? We could have implemented this on etcd server layer, correct?
I think having a condition check in etcd side can have a schedule like 1. etcd logic detect a node itself is a leader, but 2. when proposal happens its raft layer became a follower already. It's described in etcd-io/raft#73 so I'd like you to cross check.
On the other hand, this will be a breaking change as LeaseRevoke can also be initiated from user on demand. Not to mention raft has to de-serialize each entry in the message which might have negative performance impact as well.
This is true. We should have a mechanism of prioritizing leader if we really use this approach. The performance overhead is also introduced as you point out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this PR (and the corresponding raft side PR: etcd-io/raft#88) with the approach with the failpoint based approach: #15247 (comment)
(the commands in the comment needs to have additional for attaching a lease to k1
like etcdctl put k1 v1 --lease 32698b80fed16704
)
With this PR I could check that lease revoke was dropped as expected. k1
exists even after a node which was injected sleep came back. Log messages are like this:
23:32:57 etcd2 | {"level":"info","ts":"2023-10-30T23:32:57.347423+0900","logger":"raft","caller":"[email protected]/raft.go:1689","msg":"91bc3c398fb3c146 not forwarding to leader fd422379fda50e48 at term 3 because disableProposalForwardingCallback() returned true for the message; dropping proposal"}
23:32:57 etcd2 | {"level":"warn","ts":"2023-10-30T23:32:57.347554+0900","caller":"etcdserver/server.go:879","msg":"failed to revoke lease","lease-id":"32698b80fed16704","error":"raft proposal dropped"}
23:33:04 etcd2 | {"level":"info","ts":"2023-10-30T23:33:04.347552+0900","logger":"raft","caller":"[email protected]/raft.go:1689","msg":"91bc3c398fb3c146 not forwarding to leader fd422379fda50e48 at term 3 because disableProposalForwardingCallback() returned true for the message; dropping proposal"}
23:33:04 etcd2 | {"level":"warn","ts":"2023-10-30T23:33:04.347695+0900","caller":"etcdserver/server.go:879","msg":"failed to revoke lease","lease-id":"32698b80fed16704","error":"raft proposal dropped"}
(Note that repeating this error itself should be avoided by fixing server.go)
On the main branch, k1
is removed by the lease revoke and the keepalive CLI behaved like this:
~/g/s/g/etcd [main]× » bin/etcdctl lease keep-alive 32698b8108524912 23:43:31
lease 32698b8108524912 keepalived with TTL(5)
...
lease 32698b8108524912 keepalived with TTL(5)
lease 32698b8108524912 expired or revoked.
Server side logs were like this:
23:44:15 etcd1 | {"level":"error","ts":"2023-10-30T23:44:15.844122+0900","caller":"mvcc/kvstore_txn.go:313","msg":"failed to detach old lease f
rom a key","error":"lease not found","stacktrace":"go.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).delete\n\tgo.etcd.io/etcd/server/v3/
storage/mvcc/kvstore_txn.go:313\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).deleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/
kvstore_txn.go:276\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).DeleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/kvstore_txn.g
o:171\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*metricsTxnWrite).DeleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/metrics_txn.go:46\ngo.etcd.io/etcd/server/v3/lease.(*lessor).Revoke\n\tgo.etcd.io/etcd/server/v3/lease/lessor.go:345\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*appl
ierV3backend).LeaseRevoke\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply.go:205\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*authApplierV3).LeaseRevoke\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply_auth.go:121\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*uberApplier).dispa
tch\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/uber_applier.go:174\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*applierV3backend).Apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply.go:156\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*authApplierV3).Apply\n\tgo.etcd.io/etcd/s
erver/v3/etcdserver/apply/apply_auth.go:61\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*uberApplier).Apply\n\tgo.etcd.io/etcd/server/v3/etcdse
rver/apply/uber_applier.go:117\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/ser
ver.go:1929\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1826\ngo.etcd.io/etcd/s
erver/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1131\ngo.etcd.io/etcd/server/v3/etcdserver.(*E
tcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:919\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\tgo.e
tcd.io/etcd/server/v3/etcdserver/server.go:842\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedu
le.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:206\ngo.etcd.io/etcd
/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:187"}
It's a little bit odd because revoking happens twice so I'm still checking. Also, we need to turn parameters for lease TTL and sleep time for injecting for not causing rafthttp time out. I used 5 seconds for TTL and 7 seconds for sleeping and got the above result. I also updated DefaultConnReadTimeout
and DefaultConnWriteTimeout
(https://github.com/etcd-io/etcd/blob/main/server/etcdserver/api/rafthttp/peer.go#L40-L41) to 10 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, related behavior of etcd + raft library is quite subtle so it's really nice if you can verify the approach when you have time :)
@mitake: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
This PR tries to drop lease revoke requests triggered by a stale leader node. Some requests triggered by etcd itself (not issued by clients) can be applied by a cluster even if it's issued by a stale leader, and it might cause problems like #15247 The behavior is quite subtle and I'm not 100% sure that this PR can fix the problem. For verifying it, I think robustness test cases will be needed.
Note that this PR depends on the companion PR in the Raft library side: etcd-io/raft#88 It cannot be built without the change in the Raft side change. If reviewers are interested in, I'll update go.mod files for pointing the branch of my repository for easy testing.