Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storeliveness: check clock to determine support from #131113

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

miraradeva
Copy link
Contributor

@miraradeva miraradeva commented Sep 20, 2024

Previously, in response to SupportFrom calls, the SupportManager could return true (support is provided) with an expiration timestamp in the past. It was the caller's responsibility to compare the returned timestamp to its clock. This logic is error-prone because the caller may only consider the boolean and conclude support is provided.

This commit moves the comparison to the current clock time to the SupportFrom implementations.

Part of: #125063

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miraradeva miraradeva marked this pull request as ready for review September 23, 2024 13:29
@miraradeva miraradeva requested a review from a team as a code owner September 23, 2024 13:29
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @miraradeva)


-- commits line 8 at r1:
"because"


pkg/kv/kvserver/storeliveness/fabric.go line 28 at r1 (raw file):

	// supports S_remote.
	//
	// Whether support is provided or not is evaluated at the time of calling

This seems unsafe to me. Raft uses SupportFor to determine whether it can vote against the leader. If needs an absence of support to be irrevocable. If we check the expiration against the local clock, we may return false but then later grant more support to the existing epoch. Either we withdraw support synchronously (something we decided not to do) or we return true until the withdrawal timer fires.


pkg/kv/kvserver/storeliveness/fabric.go line 46 at r1 (raw file):

	// supported by S_remote.
	//
	// Whether support is provided or not is evaluated at the time of calling

This on the other hand is safe, as there is no assumption of irrevocable lack of support. An epoch may not be supported at one moment but supported again shortly thereafter.

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


-- commits line 8 at r1:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"because"

Done


pkg/kv/kvserver/storeliveness/fabric.go line 28 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This seems unsafe to me. Raft uses SupportFor to determine whether it can vote against the leader. If needs an absence of support to be irrevocable. If we check the expiration against the local clock, we may return false but then later grant more support to the existing epoch. Either we withdraw support synchronously (something we decided not to do) or we return true until the withdrawal timer fires.

Yes, good catch, I didn't think that through.


pkg/kv/kvserver/storeliveness/fabric.go line 46 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This on the other hand is safe, as there is no assumption of irrevocable lack of support. An epoch may not be supported at one moment but supported again shortly thereafter.

Agree.

@miraradeva miraradeva changed the title storeliveness: check clock to determine support for/from storeliveness: check clock to determine support from Sep 23, 2024
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 6 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)


pkg/kv/kvserver/storeliveness/fabric.go line 46 at r1 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

Agree.

nit: this might be worth capturing as a [*] below.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miraradeva)


pkg/kv/kvserver/storeliveness/support_manager.go line 126 at r2 (raw file):

	// A support expiration timestamp equal to the current time is considered
	// expired, to be consistent with the logic in maybeWithdrawSupport.
	if ss.Expiration.IsEmpty() || ss.Expiration.LessEq(sm.clock.Now()) {

I'm still torn on whether this is a good idea or not. It's unnecessary for the main use of SupportFrom — evaluating LeadSupportUntil, which will be compared against a request timestamp. I wouldn't be surprised if it becomes a performance issue, which we will determine as part of #125255.

The change is only needed for some of the newer uses of SupportFrom being added in #130628 and #130938. As I mentioned in #130938 (review), the raw use of StoreLiveness in raft.go leaves something to be desired. There's also an argument that given the presence of a SupportExpired method on StoreLiveness, users of the interface should be expected to engage with possibility that support was given but then later expired.

An alternative to this change is to remove the bool return value and blindly return ss.Epoch, ss.Expiration. That would prevent misuse and force all callers to look at the value of the timestamp. I'm curious what you think about that?

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/storeliveness/support_manager.go line 126 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm still torn on whether this is a good idea or not. It's unnecessary for the main use of SupportFrom — evaluating LeadSupportUntil, which will be compared against a request timestamp. I wouldn't be surprised if it becomes a performance issue, which we will determine as part of #125255.

The change is only needed for some of the newer uses of SupportFrom being added in #130628 and #130938. As I mentioned in #130938 (review), the raw use of StoreLiveness in raft.go leaves something to be desired. There's also an argument that given the presence of a SupportExpired method on StoreLiveness, users of the interface should be expected to engage with possibility that support was given but then later expired.

An alternative to this change is to remove the bool return value and blindly return ss.Epoch, ss.Expiration. That would prevent misuse and force all callers to look at the value of the timestamp. I'm curious what you think about that?

I've always liked the idea of getting rid of the boolean. The only good reason to keep it is something you mentioned last Thursday: the expiration can be empty, so we would need logic to handle empty vs non-empty expiration when comparing against another timestamp. I think that's ok if we do the logic in SupportExpired.

On a separate note, should SupportExpired be called SupportFromExpired to make it more explicit that this should be used to compare timestamps returned by SupportFrom? Though it's not really possible to make the mistake I did above, of comparing against the SupportFor timestamp, because store liveness doesn't expose it.

Previously, `SupportFrom` returned a boolean indicating whether support
is provided, in addition to an epoch and an expiration of the provided
support. It was the caller's responsibility to not blindly trust the
boolean but also to compare the returned timestamp to its clock
(a timestamp in the past imples support has expired). This logic is
error-prone because the caller may only consider the boolean and
conclude support is provided.

This commit removes the boolean return value in `SupportFrom`, and makes
it explicit that the caller needs to compare the expiration timestamp
to determine if support is provided.

Part of: cockroachdb#125063

Release note: None
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r2, 14 of 14 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @miraradeva)

@miraradeva
Copy link
Contributor Author

bors r=nvanbenschoten

@craig craig bot merged commit 343bd46 into cockroachdb:master Sep 26, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants