-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storeliveness: check clock to determine support from #131113
Conversation
be3a2e0
to
17ec4ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @miraradeva)
-- commits
line 8 at r1:
"because"
pkg/kv/kvserver/storeliveness/fabric.go
line 28 at r1 (raw file):
// supports S_remote. // // Whether support is provided or not is evaluated at the time of calling
This seems unsafe to me. Raft uses SupportFor
to determine whether it can vote against the leader. If needs an absence of support to be irrevocable. If we check the expiration against the local clock, we may return false but then later grant more support to the existing epoch. Either we withdraw support synchronously (something we decided not to do) or we return true until the withdrawal timer fires.
pkg/kv/kvserver/storeliveness/fabric.go
line 46 at r1 (raw file):
// supported by S_remote. // // Whether support is provided or not is evaluated at the time of calling
This on the other hand is safe, as there is no assumption of irrevocable lack of support. An epoch may not be supported at one moment but supported again shortly thereafter.
17ec4ba
to
8133fe0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"because"
Done
pkg/kv/kvserver/storeliveness/fabric.go
line 28 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This seems unsafe to me. Raft uses
SupportFor
to determine whether it can vote against the leader. If needs an absence of support to be irrevocable. If we check the expiration against the local clock, we may return false but then later grant more support to the existing epoch. Either we withdraw support synchronously (something we decided not to do) or we return true until the withdrawal timer fires.
Yes, good catch, I didn't think that through.
pkg/kv/kvserver/storeliveness/fabric.go
line 46 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This on the other hand is safe, as there is no assumption of irrevocable lack of support. An epoch may not be supported at one moment but supported again shortly thereafter.
Agree.
8133fe0
to
e45638b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)
pkg/kv/kvserver/storeliveness/fabric.go
line 46 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
Agree.
nit: this might be worth capturing as a [*] below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miraradeva)
pkg/kv/kvserver/storeliveness/support_manager.go
line 126 at r2 (raw file):
// A support expiration timestamp equal to the current time is considered // expired, to be consistent with the logic in maybeWithdrawSupport. if ss.Expiration.IsEmpty() || ss.Expiration.LessEq(sm.clock.Now()) {
I'm still torn on whether this is a good idea or not. It's unnecessary for the main use of SupportFrom
— evaluating LeadSupportUntil
, which will be compared against a request timestamp. I wouldn't be surprised if it becomes a performance issue, which we will determine as part of #125255.
The change is only needed for some of the newer uses of SupportFrom
being added in #130628 and #130938. As I mentioned in #130938 (review), the raw use of StoreLiveness
in raft.go
leaves something to be desired. There's also an argument that given the presence of a SupportExpired
method on StoreLiveness
, users of the interface should be expected to engage with possibility that support was given but then later expired.
An alternative to this change is to remove the bool
return value and blindly return ss.Epoch, ss.Expiration
. That would prevent misuse and force all callers to look at the value of the timestamp. I'm curious what you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/storeliveness/support_manager.go
line 126 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm still torn on whether this is a good idea or not. It's unnecessary for the main use of
SupportFrom
— evaluatingLeadSupportUntil
, which will be compared against a request timestamp. I wouldn't be surprised if it becomes a performance issue, which we will determine as part of #125255.The change is only needed for some of the newer uses of
SupportFrom
being added in #130628 and #130938. As I mentioned in #130938 (review), the raw use ofStoreLiveness
inraft.go
leaves something to be desired. There's also an argument that given the presence of aSupportExpired
method onStoreLiveness
, users of the interface should be expected to engage with possibility that support was given but then later expired.An alternative to this change is to remove the
bool
return value and blindlyreturn ss.Epoch, ss.Expiration
. That would prevent misuse and force all callers to look at the value of the timestamp. I'm curious what you think about that?
I've always liked the idea of getting rid of the boolean. The only good reason to keep it is something you mentioned last Thursday: the expiration can be empty, so we would need logic to handle empty vs non-empty expiration when comparing against another timestamp. I think that's ok if we do the logic in SupportExpired
.
On a separate note, should SupportExpired
be called SupportFromExpired
to make it more explicit that this should be used to compare timestamps returned by SupportFrom
? Though it's not really possible to make the mistake I did above, of comparing against the SupportFor
timestamp, because store liveness doesn't expose it.
e45638b
to
c8111af
Compare
Previously, `SupportFrom` returned a boolean indicating whether support is provided, in addition to an epoch and an expiration of the provided support. It was the caller's responsibility to not blindly trust the boolean but also to compare the returned timestamp to its clock (a timestamp in the past imples support has expired). This logic is error-prone because the caller may only consider the boolean and conclude support is provided. This commit removes the boolean return value in `SupportFrom`, and makes it explicit that the caller needs to compare the expiration timestamp to determine if support is provided. Part of: cockroachdb#125063 Release note: None
c8111af
to
72e0001
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2, 14 of 14 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @miraradeva)
bors r=nvanbenschoten |
Previously, in response to
SupportFrom
calls, the SupportManager could returntrue
(support is provided) with an expiration timestamp in the past. It was the caller's responsibility to compare the returned timestamp to its clock. This logic is error-prone because the caller may only consider the boolean and conclude support is provided.This commit moves the comparison to the current clock time to the
SupportFrom
implementations.Part of: #125063
Release note: None