[Feature] Add support for async OAuth token refreshes #1135

renaudhartert-db · 2025-01-22T20:58:10Z

What changes are proposed in this pull request?

This PR aims at eliminating long-tail latency due to OAuth token refreshes in scenarios where a single client is responsible for a relatively high (e.g. > 1 QPS) continuous outbound traffic. The feature is disabled by default — which arguably makes this PR a functional no-op.

Precisely, the PR introduces a new token cache which attempts to always keep its token fresh by asynchronously refreshing the token before it expires. We differentiate three token states:

fresh: The token is valid and is not close to its expiration.
stale: The token is valid but will expire soon.
expired: The token has expired and cannot be used.

Each time a request tries to access the token, we do the following:

If the token is fresh, return the current token;
If the token is stale, trigger an asynchronous refresh and return the current token;
If the token is expired, make a blocking refresh call to update the token and return it.

In particular, asynchronous refreshes use a lock to guarantee that there can only be one pending refresh at a given time.

The performance of the algorithm depends on the length of the stale and fresh periods. On the first hand, the stale period must be long enough to prevent tokens from entering the expired state. On the other hand, a long stale period reduces the length of the fresh period, thus increasing the refresh frequency.

Right now, the stale period is configured to 3 minutes by default (i.e. 5% of the expected token lifespan of 1 hour). This value might be changed in the future to guarantee that the default behavior achieves the best performance for the majority of users.

For reviewers:

This PR only uses the new cache in control-plane auth flows; I plan to send a follow-up PR to enable asynchronous refresh in data-plane flows once this one has been merged.
Interface oauth2.TokenSource is likely not sufficient for us and we would need one with a Token method that takes a context.Context as parameter: Token(context.Context) (Token, error).

How is this tested?

Complete test coverage with a focus on various concurrency scenarios.

pietern · 2025-01-27T08:15:04Z

config/experimental/auth/auth.go

+	cts := &cachedTokenSource{
+		tokenSource:   ts,
+		staleDuration: defaultStaleDuration,
+		disableAsync:  defaultDisableAsyncRefresh,


disable -> enable + flip the defaults

By including negation in the property, expressions often become double negations, which are harder to read/interpret than a simple if enableAsync.

Agreed in general but I think this should be a little more nuanced.

Ultimately, the intended default behavior is to enable async. Using disable as the field guarantees that the zero value is the default. This makes it easy to reason about the code as a non-zero value always correspond to a special case — which would typically be handed at higher level of indentation.

I'd prefer to keep the code as it is (note that the public interface WithAsyncRefresh has no negation) except if you feel strongly about it.

config/experimental/auth/auth.go

pietern · 2025-01-27T08:34:29Z

config/experimental/auth/auth.go

+	case fresh:
+		cts.mu.Lock()
+		defer cts.mu.Unlock()
+		return cts.cachedToken, nil


Acquiring the mutex twice is a race condition.

The cached token itself and its state are coupled and should be retrieved atomically.

I agree that making the operation atomic would speed up the code. Though, I'm not sure I see what the race condition would be. It's true that we might return a token that is older than what the state suggest but that can happen even if the operation is atomic, it is just a little more likely right now.

pietern · 2025-01-27T08:59:39Z

config/experimental/auth/auth.go

+		defer cts.mu.Unlock()
+		return cts.cachedToken, nil
+	default: // expired
+		return cts.blockingToken()


By locking inside this function and not deduping the calls to Token() become sequential if there is a failure.

I think it is fair to try and acquire the token once and also cache the failure.

Doing this would also allow for a single token refresh path instead of having separate async refresh and blocking refresh functions.

By locking inside this function and not deduping the calls to Token() become sequential if there is a failure.

I might be misunderstanding your comment but shouldn't the calls be sequential even if we were to cache the error? The blockingToken function already contains a path (lines 186-188) to avoid calling the TokenSource if a call (either blocking or async) successfully refreshed the token.

I think it is fair to try and acquire the token once and also cache the failure.

Considering blockingToken errors as transient is intended. This guarantees that the errors returned by cachedTokenSource have the same semantic as the errors returned by the wrapped TokenSource. I'd like to keep that property for now; what do you think?

github-actions · 2025-01-29T08:52:17Z

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-go

Inputs:

PR number: 1135
Commit SHA: 8ac8c6e3ed8cd1a2350c495bee1d73eca9404fe4

Checks will be approved automatically on success.

Add async token cache

ac57af3

renaudhartert-db temporarily deployed to test-trigger-is January 22, 2025 20:58 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 22, 2025 21:06 — with GitHub Actions Inactive

Fix issue in trigger async

356b433

renaudhartert-db temporarily deployed to test-trigger-is January 23, 2025 07:22 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 23, 2025 07:23 — with GitHub Actions Inactive

Use CTS

799aae7

renaudhartert-db temporarily deployed to test-trigger-is January 23, 2025 08:03 — with GitHub Actions Inactive

Merge branch 'main' into renaud.hartert/async-refresh

3297183

renaudhartert-db temporarily deployed to test-trigger-is January 23, 2025 08:09 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 23, 2025 08:22 — with GitHub Actions Inactive

renaudhartert-db changed the title ~~[Feature] Add async token cache~~ [Feature] Add support for async OAuth token refreshes Jan 23, 2025

renaudhartert-db marked this pull request as ready for review January 23, 2025 08:43

renaudhartert-db requested a review from pietern January 23, 2025 08:43

Add more comments

dc40cbb

renaudhartert-db temporarily deployed to test-trigger-is January 23, 2025 09:05 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 23, 2025 09:42 — with GitHub Actions Inactive

Clarify experimental status

1f30f68

renaudhartert-db temporarily deployed to test-trigger-is January 26, 2025 16:09 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 26, 2025 16:10 — with GitHub Actions Inactive

Clarify experimental status

03cb267

renaudhartert-db temporarily deployed to test-trigger-is January 26, 2025 17:52 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 26, 2025 17:54 — with GitHub Actions Inactive

pietern reviewed Jan 27, 2025

View reviewed changes

Add comments

89b972d

renaudhartert-db temporarily deployed to test-trigger-is January 27, 2025 12:06 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 27, 2025 12:08 — with GitHub Actions Inactive

renaudhartert-db requested a review from pietern January 27, 2025 12:25

Merge branch 'main' into renaud.hartert/async-refresh

8ac8c6e

renaudhartert-db temporarily deployed to test-trigger-is January 29, 2025 08:52 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is January 29, 2025 08:53 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add support for async OAuth token refreshes #1135

[Feature] Add support for async OAuth token refreshes #1135

renaudhartert-db commented Jan 22, 2025 •

edited

Loading

pietern Jan 27, 2025

renaudhartert-db Jan 27, 2025 •

edited

Loading

pietern Jan 27, 2025

renaudhartert-db Jan 27, 2025

pietern Jan 27, 2025

renaudhartert-db Jan 27, 2025 •

edited

Loading

github-actions bot commented Jan 29, 2025

[Feature] Add support for async OAuth token refreshes #1135

Are you sure you want to change the base?

[Feature] Add support for async OAuth token refreshes #1135

Conversation

renaudhartert-db commented Jan 22, 2025 • edited Loading

What changes are proposed in this pull request?

How is this tested?

pietern Jan 27, 2025

Choose a reason for hiding this comment

renaudhartert-db Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

pietern Jan 27, 2025

Choose a reason for hiding this comment

renaudhartert-db Jan 27, 2025

Choose a reason for hiding this comment

pietern Jan 27, 2025

Choose a reason for hiding this comment

renaudhartert-db Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jan 29, 2025

renaudhartert-db commented Jan 22, 2025 •

edited

Loading

renaudhartert-db Jan 27, 2025 •

edited

Loading

renaudhartert-db Jan 27, 2025 •

edited

Loading