Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP, DO NOT MERGE: Disable proposal forwarding for lease revoke requests #16285

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mitake
Copy link
Contributor

@mitake mitake commented Jul 24, 2023

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

This PR tries to drop lease revoke requests triggered by a stale leader node. Some requests triggered by etcd itself (not issued by clients) can be applied by a cluster even if it's issued by a stale leader, and it might cause problems like #15247 The behavior is quite subtle and I'm not 100% sure that this PR can fix the problem. For verifying it, I think robustness test cases will be needed.

Note that this PR depends on the companion PR in the Raft library side: etcd-io/raft#88 It cannot be built without the change in the Raft side change. If reviewers are interested in, I'll update go.mod files for pointing the branch of my repository for easy testing.

@mitake mitake marked this pull request as draft July 24, 2023 12:23
@mitake mitake force-pushed the disable-proposal-forwarding-for-lease branch from dcbbb5c to f9a33a3 Compare July 24, 2023 12:23
@mitake mitake force-pushed the disable-proposal-forwarding-for-lease branch 2 times, most recently from a0cc43b to 137e94a Compare October 23, 2023 12:57
@mitake mitake force-pushed the disable-proposal-forwarding-for-lease branch from 137e94a to b58dfdb Compare October 23, 2023 12:57
Comment on lines +525 to +539
DisableProposalForwardingCallback: func(m raftpb.Message) bool {
for _, entry := range m.Entries {
var raftReq etcdserverpb.InternalRaftRequest
if !pbutil.MaybeUnmarshal(&raftReq, entry.Data) {
continue
}

if raftReq.LeaseRevoke != nil {
// If at least one of the entries has LeaseRevoke, the entire m will be discarded.
// TODO: LeaseCheckPoint should be discarded neither?
return true
}
}
return false
},
Copy link
Member

@chaochn47 chaochn47 Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the stale leader has not yet stepped down to follower (completely partitioned), then the proposal will be rejected due to higher term state on new leader, correct?

If the stale leader (etcdserver) already stepped down to follower, why etcd server even bother starts to propose lease revoke and we stop the proposal forwarding on raft layer? We could have implemented this on etcd server layer, correct?

On the other hand, this will be a breaking change as LeaseRevoke can also be initiated from user on demand. Not to mention raft has to de-serialize each entry in the message which might have negative performance impact as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to

  1. have a fault injection e2e test in place to reproduce the issue first.
  2. supply different failure scenarios and define the expected behavior with this fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know what I can help :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for checking this @chaochn47

If the stale leader has not yet stepped down to follower (completely partitioned), then the proposal will be rejected due to higher term state on new leader, correct?

Actually proposal (MsgProp) itself doesn't contain term information so it's not rejected by a new leader I think. I summarized this behavior in etcd-io/raft#88 (comment) and succeeding comments so it's great if you can cross check.

If the stale leader (etcdserver) already stepped down to follower, why etcd server even bother starts to propose lease revoke and we stop the proposal forwarding on raft layer? We could have implemented this on etcd server layer, correct?

I think having a condition check in etcd side can have a schedule like 1. etcd logic detect a node itself is a leader, but 2. when proposal happens its raft layer became a follower already. It's described in etcd-io/raft#73 so I'd like you to cross check.

On the other hand, this will be a breaking change as LeaseRevoke can also be initiated from user on demand. Not to mention raft has to de-serialize each entry in the message which might have negative performance impact as well.

This is true. We should have a mechanism of prioritizing leader if we really use this approach. The performance overhead is also introduced as you point out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR (and the corresponding raft side PR: etcd-io/raft#88) with the approach with the failpoint based approach: #15247 (comment)

(the commands in the comment needs to have additional for attaching a lease to k1 like etcdctl put k1 v1 --lease 32698b80fed16704)

With this PR I could check that lease revoke was dropped as expected. k1 exists even after a node which was injected sleep came back. Log messages are like this:

23:32:57 etcd2 | {"level":"info","ts":"2023-10-30T23:32:57.347423+0900","logger":"raft","caller":"[email protected]/raft.go:1689","msg":"91bc3c398fb3c146 not forwarding to leader fd422379fda50e48 at term 3 because disableProposalForwardingCallback() returned true for the message; dropping proposal"}
23:32:57 etcd2 | {"level":"warn","ts":"2023-10-30T23:32:57.347554+0900","caller":"etcdserver/server.go:879","msg":"failed to revoke lease","lease-id":"32698b80fed16704","error":"raft proposal dropped"}
23:33:04 etcd2 | {"level":"info","ts":"2023-10-30T23:33:04.347552+0900","logger":"raft","caller":"[email protected]/raft.go:1689","msg":"91bc3c398fb3c146 not forwarding to leader fd422379fda50e48 at term 3 because disableProposalForwardingCallback() returned true for the message; dropping proposal"}
23:33:04 etcd2 | {"level":"warn","ts":"2023-10-30T23:33:04.347695+0900","caller":"etcdserver/server.go:879","msg":"failed to revoke lease","lease-id":"32698b80fed16704","error":"raft proposal dropped"}

(Note that repeating this error itself should be avoided by fixing server.go)

On the main branch, k1 is removed by the lease revoke and the keepalive CLI behaved like this:

~/g/s/g/etcd [main]× » bin/etcdctl lease keep-alive 32698b8108524912                                                                   23:43:31
lease 32698b8108524912 keepalived with TTL(5)
...
lease 32698b8108524912 keepalived with TTL(5)
lease 32698b8108524912 expired or revoked.

Server side logs were like this:

23:44:15 etcd1 | {"level":"error","ts":"2023-10-30T23:44:15.844122+0900","caller":"mvcc/kvstore_txn.go:313","msg":"failed to detach old lease f
rom a key","error":"lease not found","stacktrace":"go.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).delete\n\tgo.etcd.io/etcd/server/v3/
storage/mvcc/kvstore_txn.go:313\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).deleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/
kvstore_txn.go:276\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*storeTxnWrite).DeleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/kvstore_txn.g
o:171\ngo.etcd.io/etcd/server/v3/storage/mvcc.(*metricsTxnWrite).DeleteRange\n\tgo.etcd.io/etcd/server/v3/storage/mvcc/metrics_txn.go:46\ngo.etcd.io/etcd/server/v3/lease.(*lessor).Revoke\n\tgo.etcd.io/etcd/server/v3/lease/lessor.go:345\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*appl
ierV3backend).LeaseRevoke\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply.go:205\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*authApplierV3).LeaseRevoke\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply_auth.go:121\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*uberApplier).dispa
tch\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/uber_applier.go:174\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*applierV3backend).Apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply.go:156\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*authApplierV3).Apply\n\tgo.etcd.io/etcd/s
erver/v3/etcdserver/apply/apply_auth.go:61\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*uberApplier).Apply\n\tgo.etcd.io/etcd/server/v3/etcdse
rver/apply/uber_applier.go:117\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/ser
ver.go:1929\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1826\ngo.etcd.io/etcd/s
erver/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1131\ngo.etcd.io/etcd/server/v3/etcdserver.(*E
tcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:919\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\tgo.e
tcd.io/etcd/server/v3/etcdserver/server.go:842\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedu
le.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:206\ngo.etcd.io/etcd
/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/[email protected]/schedule/schedule.go:187"}    

It's a little bit odd because revoking happens twice so I'm still checking. Also, we need to turn parameters for lease TTL and sleep time for injecting for not causing rafthttp time out. I used 5 seconds for TTL and 7 seconds for sleeping and got the above result. I also updated DefaultConnReadTimeout and DefaultConnWriteTimeout (https://github.com/etcd-io/etcd/blob/main/server/etcdserver/api/rafthttp/peer.go#L40-L41) to 10 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, related behavior of etcd + raft library is quite subtle so it's really nice if you can verify the approach when you have time :)

@k8s-ci-robot
Copy link

@mitake: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-verify b58dfdb link true /test pull-etcd-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants