-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getSignedUrl slow when using Workload Identity on GKE #1386
Comments
It does appear that credentials aren't cached at all when signing objects. Found this in the signer:
|
When getting signed urls, the credentials are always loaded only so that the client email can be used in the signing blob. The Google Auth client caches the email on the first call, which never changed. When using GCE/GKE for authentication, calling `getCredentials()` makes a call to the metadata service. This call is often not needed because the email is already known. In the case of signing many files at once, this can really degrade the metadata service and cause timeout errors. This change simply uses the cached client email if it exists, before making a remote call Fixes googleapis#1386
When getting signed urls, the credentials are always loaded only so that the client email can be used in the signing blob. The Google Auth client caches the email on the first call, which never changed. When using GCE/GKE for authentication, calling `getCredentials()` makes a call to the metadata service. This call is often not needed because the email is already known. In the case of signing many files at once, this can really degrade the metadata service and cause timeout errors. This change simply uses the cached client email if it exists, before making a remote call Fixes googleapis#1386
When getting signed urls, the credentials are always loaded only so that the client email can be used in the signing blob. The Google Auth client caches the email on the first call, which never changed. When using GCE/GKE for authentication, calling `getCredentials()` makes a call to the metadata service. This call is often not needed because the email is already known. In the case of signing many files at once, this can really degrade the metadata service and cause timeout errors. This change simply uses the cached client email if it exists, before making a remote call Fixes googleapis#1386
@shaffeeullah @nhumrich Afraid that I have bad news. We have updated to the latest stable version of this library. Relevant portions of our
After this update we did not observe performance improving for The timing results of a hot codepath request can be seen below. The deployment with the keyfile kicked in after 12:20pm, you can see that the average request timing dropped from around 3.5 to around 1 second: This was literally the only change on the deployment at 12:20pm (specifying the json keyfile instead of the service account through workload identity): P.S. don't think I can reopen this ticket, I'll wait for you guys to do it, and if it doesn't show up in your notifications I'll open a new ticket after some time. Thanks! |
Oh, and for a bit more context: this single request is signing 170 objects at once. |
@edorivai The auth library is caching the credentials correctly. You can see the code for this here. |
Please reopen if you have further questions. |
For anyone else running into this issue. It seems the issue is that this library is up to date, but @google-cloud/common uses a old version (7.0.2) of google-cloud-auth, so the fix isn't making it to most libraries, such as: @google-cloud/storage. |
@nhumrich What change are you referring to? I am in the process of updating the library version so common is most up to date, but the caching logic linked above has not changed between versions. |
@shaffeeullah You are right, that hasn't changed for two years. Well, I am at a loss. I too am having a problem where the caching isn't happening using the storage library, trying to sign objects. Seems to only happen on GKE using workload identity. I tried to get to bottom of it, seemed like the workload identity server was getting overloaded. I tried cloning this repo instead of using the packaged version, and the problem went away. I guess I need to do more debugging. But its hard to catch, as it only happens on gke, not local. |
I have also been having issues with generating signed URLs, where code worked yesterday but is broken upon redeploying today. Minimal example of cloud function: import { Storage } from "@google-cloud/storage";
export async function run(event, context) {
const file = storage.bucket("<bucket_name>").file("test.mp3");
const options = {
version: 'v4',
action: 'write',
expires: Date.now() + 15 * 60 * 1000, // 15 minutes
contentType: 'application/octet-stream',
};
await file.getSignedUrl(options)
.then(console.log)
.catch(console.error);
}
const storage = new Storage(); produces:
This may not be related, but the fact that the PR above updates the auth library and my error is about auth (and the fact is started around 17 hours ago) makes me think it's related to this |
Hello! We were facing a similar issue, where we were seeing performance degradation in cloud run, but the performance as acceptable on our local machines. We ended up changing our code so that we specified the credentials ourselves, instead of relying on google getting them from their metadata server in the cloud. So, our code looks something like: let parsedPrivateKey;
try {
parsedPrivateKey = JSON.parse(googleStorageCredentials.privateKey);
} catch (err) {
parsedPrivateKey = googleStorageCredentials.privateKey;
}
const storageClient = new Storage({
projectId: googleCredentials.projectId,
credentials: {
client_email: googleStorageCredentials.clientEmail,
private_key: parsedPrivateKey,
},
}); |
Two-plus years later, this remains a problem. |
Our application needs to sign objects in relatively large bursts. A single request from our browser application could result in 100s of storage objects to be returned by our server, which should then all be signed concurrently.
We're observing the signing to be slow for large batches when running our application in GKE. In contrast, on our local dev machines, signing is fast for large batches.
A notable difference between our production env (GKE) and our local environments is that GKE is set up with Workload Identitity, while local development uses a local service account key file (JSON).
In some cases we've seen errors like these, but in most cases signing is just slow (~1 second):
I'm pretty sure that
169.254.169.254
is the local IP for the GKE metadata server.So my current hypothesis is that the metadata daemon in GKE is either hitting an actual resource limit, or it's being throttled somewhere. I also believe that this
@google-cloud/storage
module is not caching credential responses from the metadata service.This is quite unfortunate, since we recently migrated from using Service Account keyfiles in our GKE Secrets, to using Workload Identity as it seems to be the recommended mode of authenticating our workloads.
I wonder if there are intrinsic limitations to Workload Identity that prevent this library from caching the credential response. If so, would there be a way for us to take over the process of fetching the credentials using Workload Identity, so we can enforce our own caching policy here?
Environment details
FROM node:12.16.3-alpine3.11
(docker base image)@google-cloud/storage
version:5.5.0
Steps to reproduce
storage.bucket('...').file('...').getSignedUrl(...)
The text was updated successfully, but these errors were encountered: