Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry more Google API failures. #2196

Merged
merged 1 commit into from
Jul 1, 2024
Merged

Conversation

jpd236
Copy link
Contributor

@jpd236 jpd236 commented Jun 28, 2024

The default retry policy (documented
here) does not retry POST requests. But most Google API calls use POST requests under the covers, including read-only APIs like Drive
Activity
. So we can also ask to retry POST requests.

We retain other aspects of the default policy, such as the number of attempts (3) and the HTTP status codes to retry (100-109, 408, 429, 5xx). If a request succeeds on a retry, the failure is kept silent and does not pollute the logs.

The main risk is that a POST method might mutate some state despite returning a failure, and that inline retries might handle that incorrectly or otherwise make things worse than retrying the full flow from the ground up. This doesn't seem likely to be an issue with the current set of API calls, but if desired, we could narrow this change to specific known-safe requests.

See #2160
See #2161

Copy link
Member

@ebroder ebroder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Noting just for the record that we may end up needing to add some error handling logic around stuff like Drive permissions, where the APIs may not be idempotent but a "resource already exists" error shouldn't cause us to blow up. But this seems strictly better than that case

@ebroder
Copy link
Member

ebroder commented Jun 28, 2024

Not sure why builds aren't running on this commit. Maybe amend and re-push and see if it kicks off?

@jpd236
Copy link
Contributor Author

jpd236 commented Jun 28, 2024

Thanks!

Noting just for the record that we may end up needing to add some error handling logic around stuff like Drive permissions, where the APIs may not be idempotent but a "resource already exists" error shouldn't cause us to blow up. But this seems strictly better than that case

Yep, definitely possible there'll be more to do here. FWIW, I did try API Explorer with the Drive permissions.create API, and it seems to be idempotent in that repeated calls return 200 with the same Permission object. So this does feel like it should be enough to help with #2160 (and #2161, which is readonly).

Not sure why builds aren't triggering here, but on my fork I'm seeing an error:

Error: imports/server/googleClientRefresher.ts(111,7): error TS2769: No overload matches this call.
  The last overload gave the following error.
    Type 'import("/home/runner/work/jolly-roger/jolly-roger/node_modules/gaxios/build/src/common").RetryConfig' is not assignable to type 'import("/home/runner/work/jolly-roger/jolly-roger/node_modules/googleapis-common/node_modules/gaxios/build/src/common").RetryConfig'.
      Types of property 'onRetryAttempt' are incompatible.
        Type '((err: import("/home/runner/work/jolly-roger/jolly-roger/node_modules/gaxios/build/src/common").GaxiosError<any>) => void | Promise<void>) | undefined' is not assignable to type '((err: import("/home/runner/work/jolly-roger/jolly-roger/node_modules/googleapis-common/node_modules/gaxios/build/src/common").GaxiosError<any>) => void | Promise<void>) | undefined'.
          Type '(err: import("/home/runner/work/jolly-roger/jolly-roger/node_modules/gaxios/build/src/common").GaxiosError<any>) => void | Promise<void>' is not assignable to type '(err: import("/home/runner/work/jolly-roger/jolly-roger/node_modules/googleapis-common/node_modules/gaxios/build/src/common").GaxiosError<any>) => void | Promise<void>'.
            Types of parameters 'err' and 'err' are incompatible.
              Property '[GAXIOS_ERROR_SYMBOL]' is missing in type 'import("/home/runner/work/jolly-roger/jolly-roger/node_modules/googleapis-common/node_modules/gaxios/build/src/common").GaxiosError<any>' but required in type 'import("/home/runner/work/jolly-roger/jolly-roger/node_modules/gaxios/build/src/common").GaxiosError<any>'.

googleapis-common has a conflicting dependency on an older version of gaxios but I'm not immediately sure how to resolve it. Locally it seems like reinstalling googleapis-common fixed it, but the server build is failing even after I blew away the cache... Need to dig further, but let me know if you have any ideas.

The default retry policy (documented
[here](https://github.com/googleapis/gaxios/#request-options)) does not
retry POST requests. But most Google API calls use POST requests under
the covers, including read-only APIs like [Drive
Activity](https://developers.google.com/drive/activity/v2/reference/rest/v2/activity/query).
So we can also ask to retry POST requests.

We retain other aspects of the default policy, such as the number of
attempts (3) and the HTTP status codes to retry (100-109, 408, 429,
5xx). If a request succeeds on a retry, the failure is kept silent and
does not pollute the logs.

The main risk is that a POST method might mutate some state despite
returning a failure, and that inline retries might handle that
incorrectly or otherwise make things worse than retrying the full flow
from the ground up. This doesn't seem likely to be an issue with the
current set of API calls, but if desired, we could narrow this change to
specific known-safe requests.

See deathandmayhem#2160
See deathandmayhem#2161
@jpd236
Copy link
Contributor Author

jpd236 commented Jun 28, 2024

Fixed the dependency conflict by running npm dedupe. That resulted in many unrelated changes to package-lock.json. Hopefully that's okay - let me know if there's some other way to handle it.

@ebroder ebroder merged commit 48b2379 into deathandmayhem:main Jul 1, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants