-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: mark supporting followers as recently active #130938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/raft/tracker/supporttracker.go
line 107 at r1 (raw file):
} func (st *SupportTracker) IsSuppotedBy(id pb.PeerID, currentEpoch pb.Epoch) bool {
Skip reviewing this as it's part of #130628
c9f11bc
to
80d54a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @iskettaneh)
pkg/raft/raft.go
line 1443 at r2 (raw file):
return nil case pb.MsgCheckQuorum: quorumActiveByFortification := r.supportTracker.QuorumActive()
How do we feel about this still existing, now that r.trk.QuorumActive
includes information from store liveness? Is the log message important enough to warrant preserving the logic, instead of replacing it with a call to markSupportingFollowersAsRecentlyActive
and then deferring to r.trk.QuorumActive
?
@arulajmani I think you may have had thoughts about this? I'm fine either way.
pkg/raft/raft.go
line 2424 at r2 (raw file):
// isLeaderSupportedBy returns true if the follower is supporting the leader. func (r *raft) isLeaderSupportedBy(id pb.PeerID) bool { if !r.storeLiveness.SupportFromEnabled() {
Why check this again?
pkg/raft/raft.go
line 2432 at r2 (raw file):
// epoch recorded at the leader for that follower. curEpoch, _, ok := r.storeLiveness.SupportFrom(id) return ok && r.supportTracker.IsSuppotedBy(id, curEpoch)
We don't seem to have a very disciplined relationship between raft
, storeLiveness
, and SupportTracker
. Why is this a method on raft
but LeadSupportUntil
is a method on SupportTracker
? Where do we draw the line between them?
Should SupportTracker.IsSupportedBy
exist? Or does it's existence point at a leaky abstraction? Should joins between store liveness and fortification exist entirely within the SupportTracker
or entirely outside of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @nvanbenschoten)
pkg/raft/raft.go
line 1443 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How do we feel about this still existing, now that
r.trk.QuorumActive
includes information from store liveness? Is the log message important enough to warrant preserving the logic, instead of replacing it with a call tomarkSupportingFollowersAsRecentlyActive
and then deferring tor.trk.QuorumActive
?@arulajmani I think you may have had thoughts about this? I'm fine either way.
I think this one's on me, I was the one pushing for r.supportTracker.QuorumActive
to still be a thing, even though we could defer to r.trk.QuorumActive
. The idea was that we'd like to get rid of this "quorumActiveByHeartbeats" stuff in the future entirely, once there's no going back to a pre-fortified world. In that world, it would make sense to stop using pr.RecentlyActive
stuff entirely (here, and above raft), and instead use StoreLiveness
directly instead.
So if we're going to a world where we get rid of pr.RecentlyActive
entirely, I think it makes sense to leave this quourumActiveByHeartbeats
stuff on the side; it'll make it easier to delete this code in the future.
pkg/raft/raft.go
line 2432 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We don't seem to have a very disciplined relationship between
raft
,storeLiveness
, andSupportTracker
. Why is this a method onraft
butLeadSupportUntil
is a method onSupportTracker
? Where do we draw the line between them?Should
SupportTracker.IsSupportedBy
exist? Or does it's existence point at a leaky abstraction? Should joins between store liveness and fortification exist entirely within theSupportTracker
or entirely outside of it?
Note: the comment still uses the SupportTracker
terminology instead of the FortificationTracker
terminology we decided to switch to in #130628.
I had a similar comment on https://reviewable.io/reviews/cockroachdb/cockroach/130628. My preference was for interactions with StoreLiveness
on the leader (calls to SupportFrom
) to be pushed inside the SupportTracker
, so all joins between store liveness and fortification exist entirely inside the SupportTracker
.
The preference to push interactions with store liveness inside the SupportTracker
, as opposed to lifting them up in raft
, follows from wanting LeadSupportUntil
to live on the SupportTracker
. It follows a similar pattern to other trackers we have in raft, where we're tracking some state and joining it against the configuration to answer some question. I could make a case for lifting interactions with store liveness in raft
as well -- but I agree, we should do one or the other -- we shouldn't be joining in two different places.
4173b09
to
defb43f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/raft/raft.go
line 1443 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I think this one's on me, I was the one pushing for
r.supportTracker.QuorumActive
to still be a thing, even though we could defer tor.trk.QuorumActive
. The idea was that we'd like to get rid of this "quorumActiveByHeartbeats" stuff in the future entirely, once there's no going back to a pre-fortified world. In that world, it would make sense to stop usingpr.RecentlyActive
stuff entirely (here, and above raft), and instead useStoreLiveness
directly instead.So if we're going to a world where we get rid of
pr.RecentlyActive
entirely, I think it makes sense to leave thisquourumActiveByHeartbeats
stuff on the side; it'll make it easier to delete this code in the future.
I kept it for now as it might help us differentiate if the CheckQuorum failed due to fortification issue, or due to messages issue (even though they are coupled together).
Let me know if you strongly disagree with keeping it.
pkg/raft/raft.go
line 2432 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Note: the comment still uses the
SupportTracker
terminology instead of theFortificationTracker
terminology we decided to switch to in #130628.I had a similar comment on https://reviewable.io/reviews/cockroachdb/cockroach/130628. My preference was for interactions with
StoreLiveness
on the leader (calls toSupportFrom
) to be pushed inside theSupportTracker
, so all joins between store liveness and fortification exist entirely inside theSupportTracker
.The preference to push interactions with store liveness inside the
SupportTracker
, as opposed to lifting them up inraft
, follows from wantingLeadSupportUntil
to live on theSupportTracker
. It follows a similar pattern to other trackers we have in raft, where we're tracking some state and joining it against the configuration to answer some question. I could make a case for lifting interactions with store liveness inraft
as well -- but I agree, we should do one or the other -- we shouldn't be joining in two different places.
That is addressed in the refortification PR
4704ba6
to
b9230ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @nvanbenschoten)
pkg/raft/raft.go
line 1083 at r3 (raw file):
} // Mark the followers who support the leader as recently active. This is
"Mark fortifying followers as recently active. We disable heartbeats when leader fortification is enabled, instead deferring to StoreLiveness for failure detection. As such, if there is no append activity for a raft group, it's possible for the leader to not communicate with a follower in a electionTimeout. We do not want to infer that the leader can't communicate with a follower in such cases, which could then cause the leader to spuriously step down because of CheckQuorum. Instead, we compute RecentlyActive
based on StoreLiveness instead.
pkg/raft/raft.go
line 1519 at r3 (raw file):
quorumActiveByHeartbeats := r.trk.QuorumActive() quorumActiveByFortification := r.fortificationTracker.QuorumActive() if !quorumActiveByHeartbeats {
nit: let's revert this?
pkg/raft/raft.go
line 1534 at r3 (raw file):
// active on heartbeat timeout. if !r.trk.QuorumActive() && !quorumActiveByFortification { r.logger.Debugf(
nit: let's revert this log line as well.
pkg/raft/raft.go
line 1557 at r3 (raw file):
// After we marked the followers as inactive, we can mark the followers who // are fortifying the leader as recently active now. This avoids having a // point in time where the follower is marked as not recently active until
This seems to diverge from the current behaviour -- is there a benefit of doing this?
pkg/raft/raft.go
line 2506 at r3 (raw file):
r.trk.Visit(func(id pb.PeerID, pr *tracker.Progress) { if isFortified, _ := r.fortificationTracker.IsFortifiedBy(id); isFortified {
nit: we could early return if pr.RecentlyActive
is already set to true.
f26569f
to
919c330
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/raft/raft.go
line 1083 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
"Mark fortifying followers as recently active. We disable heartbeats when leader fortification is enabled, instead deferring to StoreLiveness for failure detection. As such, if there is no append activity for a raft group, it's possible for the leader to not communicate with a follower in a electionTimeout. We do not want to infer that the leader can't communicate with a follower in such cases, which could then cause the leader to spuriously step down because of CheckQuorum. Instead, we compute
RecentlyActive
based on StoreLiveness instead.
Done.
pkg/raft/raft.go
line 1557 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
This seems to diverge from the current behaviour -- is there a benefit of doing this?
Discussed offline. Decided to remove it to avoid changing the current behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r3, 2 of 3 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @iskettaneh and @nvanbenschoten)
pkg/raft/testdata/fortification_checkquorum.txt
line 41 at r4 (raw file):
# Even if we haven't received any messages from the follower since last tick, # we are still okay because the follower is supporting us which means that it's
"is fortifying us"
Also consider printing the fortification state here to make this obvious in the test (print-fortification-state
.
This commit marks followers who support the current leader as recently active. This is very important because if we stop heartbeats, we might not send any MsgApp to the followers because there is nothing new to append. This means that we will mark the followers as inactive, even though that they are technically active and supporting the leader. Fixes: cockroachdb#131349 Release note: None
919c330
to
420b28e
Compare
bors r+ |
This commit marks followers who support the current leader as recently active. This is very important because if we stop heartbeats, we might not send any MsgApp to the followers because there is nothing new to append. This means that we will mark the followers as inactive, even though that they are technically active and supporting the leader.
Epic: None
Release note: None