-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reset max query time of blocking queries in client after retries #25039
Conversation
When a blocking query on the client hits a retryable error, we change the max query time so that it falls within the `RPCHoldTimeout` timeout. But when the retry succeeds we don't reset it to the original value. Because the calls to `Node.GetClientAllocs` reuse the same request struct instead of reallocating it, any retry will cause the agent to poll at a faster frequency until the agent restarts. No other current RPC on the client has this behavior, but we'll fix this in the `rpc` method rather than in the caller so that any future users of the `rpc` method don't have to remember this detail. Fixes: #25033
2701924
to
6d9d27d
Compare
@@ -191,3 +192,57 @@ func Test_resolveServer(t *testing.T) { | |||
} | |||
|
|||
} | |||
|
|||
func TestRpc_RetryBlockTime(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test!!! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test took 10x longer to write than the fix, unfortunately. Not having any way of controlling the behavior of the lower layers of the RPC "stack" we have is probably why we have fairly poor test coverage of the error handling paths. 😿
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When a blocking query on the client hits a retryable error, we change the max query time so that it falls within the
RPCHoldTimeout
timeout. But when the retry succeeds we don't reset it to the original value.Because the calls to
Node.GetClientAllocs
reuse the same request struct instead of reallocating it, any retry will cause the agent to poll at a faster frequency until the agent restarts. No other RPC on the client currently has this behavior, but we'll fix this in therpc
method rather than in the caller so that any future users of therpc
method don't have to remember this detail.Fixes: #25033
Ref: https://hashicorp.atlassian.net/browse/NET-12116
Contributor Checklist
changelog entry using the
make cl
command.ensure regressions will be caught.
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.
Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.