reset max query time of blocking queries in client after retries #25039

tgross · 2025-02-06T20:24:03Z

When a blocking query on the client hits a retryable error, we change the max query time so that it falls within the RPCHoldTimeout timeout. But when the retry succeeds we don't reset it to the original value.

Because the calls to Node.GetClientAllocs reuse the same request struct instead of reallocating it, any retry will cause the agent to poll at a faster frequency until the agent restarts. No other RPC on the client currently has this behavior, but we'll fix this in the rpc method rather than in the caller so that any future users of the rpc method don't have to remember this detail.

Fixes: #25033
Ref: https://hashicorp.atlassian.net/browse/NET-12116

Contributor Checklist

Changelog Entry If this PR changes user-facing behavior, please generate and add a
changelog entry using the make cl command.
Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
ensure regressions will be caught.
Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.

Reviewer Checklist

Backport Labels Please add the correct backport labels as described by the internal
backporting document.
Commit Type Ensure the correct merge method is selected which should be "squash and merge"
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
within the public repository.

When a blocking query on the client hits a retryable error, we change the max query time so that it falls within the `RPCHoldTimeout` timeout. But when the retry succeeds we don't reset it to the original value. Because the calls to `Node.GetClientAllocs` reuse the same request struct instead of reallocating it, any retry will cause the agent to poll at a faster frequency until the agent restarts. No other current RPC on the client has this behavior, but we'll fix this in the `rpc` method rather than in the caller so that any future users of the `rpc` method don't have to remember this detail. Fixes: #25033

client/rpc.go

schmichael · 2025-02-06T22:36:51Z

client/rpc_test.go

@@ -191,3 +192,57 @@ func Test_resolveServer(t *testing.T) {
 	}

 }
+
+func TestRpc_RetryBlockTime(t *testing.T) {


A test!!! 🎉

The test took 10x longer to write than the fix, unfortunately. Not having any way of controlling the behavior of the lower layers of the RPC "stack" we have is probably why we have fairly poor test coverage of the error handling paths. 😿

jrasell

LGTM!

vercel bot deployed to Preview – nomad-ui February 6, 2025 20:24 View deployment

tgross force-pushed the client-rpc-retry-reset-block-time branch from 2701924 to 6d9d27d Compare February 6, 2025 20:25

tgross added backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line type/bug labels Feb 6, 2025

tgross added this to the 1.9.x milestone Feb 6, 2025

tgross added the theme/client label Feb 6, 2025

vercel bot deployed to Preview – nomad-ui February 6, 2025 20:26 View deployment

tgross mentioned this pull request Feb 6, 2025

RPC retries from client can alter blocking time for Node.GetClientAllocs until client restart #25033

Closed

tgross marked this pull request as ready for review February 6, 2025 20:52

tgross requested review from a team as code owners February 6, 2025 20:52

tgross requested review from jrasell, schmichael and Juanadelacuesta February 6, 2025 20:52

schmichael reviewed Feb 6, 2025

View reviewed changes

client/rpc.go Outdated Show resolved Hide resolved

address comments from code review

c1916e9

tgross requested a review from schmichael February 6, 2025 21:32

vercel bot deployed to Preview – nomad-ui February 6, 2025 21:34 View deployment

schmichael approved these changes Feb 6, 2025

View reviewed changes

jrasell approved these changes Feb 7, 2025

View reviewed changes

tgross merged commit 5d09d7a into main Feb 7, 2025
30 checks passed

tgross deleted the client-rpc-retry-reset-block-time branch February 7, 2025 13:45

hc-github-team-nomad-core mentioned this pull request Feb 7, 2025

Backport of reset max query time of blocking queries in client after retries into release/1.9.x #25049

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reset max query time of blocking queries in client after retries #25039

reset max query time of blocking queries in client after retries #25039

tgross commented Feb 6, 2025 •

edited

Loading

schmichael Feb 6, 2025

tgross Feb 7, 2025

jrasell left a comment

reset max query time of blocking queries in client after retries #25039

reset max query time of blocking queries in client after retries #25039

Conversation

tgross commented Feb 6, 2025 • edited Loading

Contributor Checklist

Reviewer Checklist

schmichael Feb 6, 2025

Choose a reason for hiding this comment

tgross Feb 7, 2025

Choose a reason for hiding this comment

jrasell left a comment

Choose a reason for hiding this comment

tgross commented Feb 6, 2025 •

edited

Loading