Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset max query time of blocking queries in client after retries #25039

Merged
merged 2 commits into from
Feb 7, 2025

Conversation

tgross
Copy link
Member

@tgross tgross commented Feb 6, 2025

When a blocking query on the client hits a retryable error, we change the max query time so that it falls within the RPCHoldTimeout timeout. But when the retry succeeds we don't reset it to the original value.

Because the calls to Node.GetClientAllocs reuse the same request struct instead of reallocating it, any retry will cause the agent to poll at a faster frequency until the agent restarts. No other RPC on the client currently has this behavior, but we'll fix this in the rpc method rather than in the caller so that any future users of the rpc method don't have to remember this detail.

Fixes: #25033
Ref: https://hashicorp.atlassian.net/browse/NET-12116

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

When a blocking query on the client hits a retryable error, we change the max
query time so that it falls within the `RPCHoldTimeout` timeout. But when the
retry succeeds we don't reset it to the original value.

Because the calls to `Node.GetClientAllocs` reuse the same request struct
instead of reallocating it, any retry will cause the agent to poll at a faster
frequency until the agent restarts. No other current RPC on the client has this
behavior, but we'll fix this in the `rpc` method rather than in the caller so
that any future users of the `rpc` method don't have to remember this detail.

Fixes: #25033
@tgross tgross force-pushed the client-rpc-retry-reset-block-time branch from 2701924 to 6d9d27d Compare February 6, 2025 20:25
@tgross tgross added backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line type/bug labels Feb 6, 2025
@tgross tgross added this to the 1.9.x milestone Feb 6, 2025
@tgross tgross marked this pull request as ready for review February 6, 2025 20:52
@tgross tgross requested review from a team as code owners February 6, 2025 20:52
client/rpc.go Outdated Show resolved Hide resolved
@@ -191,3 +192,57 @@ func Test_resolveServer(t *testing.T) {
}

}

func TestRpc_RetryBlockTime(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test!!! 🎉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test took 10x longer to write than the fix, unfortunately. Not having any way of controlling the behavior of the lower layers of the RPC "stack" we have is probably why we have fairly poor test coverage of the error handling paths. 😿

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgross tgross merged commit 5d09d7a into main Feb 7, 2025
30 checks passed
@tgross tgross deleted the client-rpc-retry-reset-block-time branch February 7, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line theme/client type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPC retries from client can alter blocking time for Node.GetClientAllocs until client restart
3 participants