Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getSignedUrl slow when using Workload Identity on GKE #1386

Closed
edorivai opened this issue Jan 21, 2021 · 12 comments
Closed

getSignedUrl slow when using Workload Identity on GKE #1386

edorivai opened this issue Jan 21, 2021 · 12 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. type: question Request for information or clarification. Not an issue.

Comments

@edorivai
Copy link

Our application needs to sign objects in relatively large bursts. A single request from our browser application could result in 100s of storage objects to be returned by our server, which should then all be signed concurrently.

We're observing the signing to be slow for large batches when running our application in GKE. In contrast, on our local dev machines, signing is fast for large batches.

A notable difference between our production env (GKE) and our local environments is that GKE is set up with Workload Identitity, while local development uses a local service account key file (JSON).

In some cases we've seen errors like these, but in most cases signing is just slow (~1 second):

http://169.254.169.254/computeMetadata/v1/instance/service-accounts/?recursive=true network timeout at:

I'm pretty sure that 169.254.169.254 is the local IP for the GKE metadata server.

So my current hypothesis is that the metadata daemon in GKE is either hitting an actual resource limit, or it's being throttled somewhere. I also believe that this @google-cloud/storage module is not caching credential responses from the metadata service.

This is quite unfortunate, since we recently migrated from using Service Account keyfiles in our GKE Secrets, to using Workload Identity as it seems to be the recommended mode of authenticating our workloads.

I wonder if there are intrinsic limitations to Workload Identity that prevent this library from caching the credential response. If so, would there be a way for us to take over the process of fetching the credentials using Workload Identity, so we can enforce our own caching policy here?

Environment details

  • OS: FROM node:12.16.3-alpine3.11 (docker base image)
  • Node.js version: 12.16.3
  • npm version: n/a
  • @google-cloud/storage version: 5.5.0

Steps to reproduce

  1. Spin up a GKE cluster with Workload Identity enabled
  2. Fire 100s of concurrent object signing requests; storage.bucket('...').file('...').getSignedUrl(...)
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jan 21, 2021
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jan 22, 2021
@shaffeeullah shaffeeullah added the type: question Request for information or clarification. Not an issue. label Jan 22, 2021
@yoshi-automation yoshi-automation removed the triage me I really want to be triaged. label Jan 22, 2021
@nhumrich
Copy link

nhumrich commented Mar 12, 2021

It does appear that credentials aren't cached at all when signing objects. Found this in the signer:

 const sign = async () => {
            const credentials = await this.authClient.getCredentials();

nhumrich added a commit to nhumrich/nodejs-storage that referenced this issue Mar 12, 2021
When getting signed urls, the credentials are always loaded
only so that the client email can be used in the signing blob.
The Google Auth client caches the email on the first call, which never
changed. When using GCE/GKE for authentication, calling
`getCredentials()` makes a call to the metadata service.
This call is often not needed because the email is already known.
In the case of signing many files at once, this can really degrade the
metadata service and cause timeout errors.
This change simply uses the cached client email if it exists, before
making a remote call

Fixes googleapis#1386
nhumrich added a commit to nhumrich/nodejs-storage that referenced this issue Mar 12, 2021
When getting signed urls, the credentials are always loaded
only so that the client email can be used in the signing blob.
The Google Auth client caches the email on the first call, which never
changed. When using GCE/GKE for authentication, calling
`getCredentials()` makes a call to the metadata service.
This call is often not needed because the email is already known.
In the case of signing many files at once, this can really degrade the
metadata service and cause timeout errors.
This change simply uses the cached client email if it exists, before
making a remote call

Fixes googleapis#1386
nhumrich added a commit to nhumrich/nodejs-storage that referenced this issue Mar 12, 2021
When getting signed urls, the credentials are always loaded
only so that the client email can be used in the signing blob.
The Google Auth client caches the email on the first call, which never
changed. When using GCE/GKE for authentication, calling
`getCredentials()` makes a call to the metadata service.
This call is often not needed because the email is already known.
In the case of signing many files at once, this can really degrade the
metadata service and cause timeout errors.
This change simply uses the cached client email if it exists, before
making a remote call

Fixes googleapis#1386
@shaffeeullah
Copy link
Contributor

@nhumrich @edorivai Based on the comments #1421 , it seems that this is a non-issue. Please re-open if you are still seeing this issue.

@edorivai
Copy link
Author

@shaffeeullah @nhumrich Afraid that I have bad news. We have updated to the latest stable version of this library. Relevant portions of our yarn.lock:

"@google-cloud/common@^3.6.0":
  version "3.6.0"
  resolved "https://registry.yarnpkg.com/@google-cloud/common/-/common-3.6.0.tgz#c2f6da5f79279a4a9ac7c71fc02d582beab98e8b"
  integrity sha512-aHIFTqJZmeTNO9md8XxV+ywuvXF3xBm5WNmgWeeCK+XN5X+kGW0WEX94wGwj+/MdOnrVf4dL2RvSIt9J5yJG6Q==
  dependencies:
    "@google-cloud/projectify" "^2.0.0"
    "@google-cloud/promisify" "^2.0.0"
    arrify "^2.0.1"
    duplexify "^4.1.1"
    ent "^2.2.0"
    extend "^3.0.2"
    google-auth-library "^7.0.2"
    retry-request "^4.1.1"
    teeny-request "^7.0.0"

[...]

"@google-cloud/storage@^5.8.1":
  version "5.8.1"
  resolved "https://registry.yarnpkg.com/@google-cloud/storage/-/storage-5.8.1.tgz#00e627723614bcf97e6e29f9a59ec39339171847"
  integrity sha512-qP8gCJ2myyMN3JMJN12d82Oo8VBSDO8vO4/x56dtQZX9+WISqcagurntnJVyFX885tIOtS97bsyv8qR1xv6HMg==
  dependencies:
    "@google-cloud/common" "^3.6.0"

After this update we did not observe performance improving for getSignedUrl. To dive a bit deeper I decided to roll a deployment which uses a JSON service account keyfile for GCP identity, instead of Workload Identity on our GKE workload.

The timing results of a hot codepath request can be seen below. The deployment with the keyfile kicked in after 12:20pm, you can see that the average request timing dropped from around 3.5 to around 1 second:

image

This was literally the only change on the deployment at 12:20pm (specifying the json keyfile instead of the service account through workload identity):
image

P.S. don't think I can reopen this ticket, I'll wait for you guys to do it, and if it doesn't show up in your notifications I'll open a new ticket after some time. Thanks!

@edorivai
Copy link
Author

Oh, and for a bit more context: this single request is signing 170 objects at once.

@shaffeeullah shaffeeullah reopened this Mar 19, 2021
@shaffeeullah
Copy link
Contributor

@edorivai The auth library is caching the credentials correctly. You can see the code for this here. getSignedUrl has faster performance locally than it does on GKE because for GKE a request must be made to get the service account private key and sign the file. Locally, no request is made. We cannot cache this value because it changes with different data (returning the same signed URL for every object wouldn't work). More information on the request can be found here.

@shaffeeullah shaffeeullah self-assigned this May 21, 2021
@shaffeeullah
Copy link
Contributor

Please reopen if you have further questions.

@nhumrich
Copy link

nhumrich commented Sep 16, 2021

For anyone else running into this issue. It seems the issue is that this library is up to date, but @google-cloud/common uses a old version (7.0.2) of google-cloud-auth, so the fix isn't making it to most libraries, such as: @google-cloud/storage.

@shaffeeullah
Copy link
Contributor

@nhumrich What change are you referring to? I am in the process of updating the library version so common is most up to date, but the caching logic linked above has not changed between versions.

@nhumrich
Copy link

@shaffeeullah You are right, that hasn't changed for two years. Well, I am at a loss. I too am having a problem where the caching isn't happening using the storage library, trying to sign objects. Seems to only happen on GKE using workload identity. I tried to get to bottom of it, seemed like the workload identity server was getting overloaded. I tried cloning this repo instead of using the packaged version, and the problem went away. I guess I need to do more debugging. But its hard to catch, as it only happens on gke, not local.

@stevenwaterman
Copy link

I have also been having issues with generating signed URLs, where code worked yesterday but is broken upon redeploying today. Minimal example of cloud function:

import { Storage } from "@google-cloud/storage";
export async function run(event, context) {
    const file = storage.bucket("<bucket_name>").file("test.mp3");
    const options = {
        version: 'v4',
        action: 'write',
        expires: Date.now() + 15 * 60 * 1000, // 15 minutes
        contentType: 'application/octet-stream',
    };
    await file.getSignedUrl(options)
        .then(console.log)
        .catch(console.error);
}
const storage = new Storage();

produces:

Error: Failure from metadata server. at GoogleAuth.getCredentialsAsync (/workspace/node_modules/google-auth-library/build/src/auth/googleauth.js:536:19) at processTicksAndRejections (internal/process/task_queues.js:95:5) at async sign (/workspace/node_modules/@google-cloud/storage/build/src/signer.js:149:33)

This may not be related, but the fact that the PR above updates the auth library and my error is about auth (and the fact is started around 17 hours ago) makes me think it's related to this

@shahmirn
Copy link

shahmirn commented Feb 2, 2022

Hello!

We were facing a similar issue, where we were seeing performance degradation in cloud run, but the performance as acceptable on our local machines.

We ended up changing our code so that we specified the credentials ourselves, instead of relying on google getting them from their metadata server in the cloud.

So, our code looks something like:

let parsedPrivateKey;
try {
  parsedPrivateKey = JSON.parse(googleStorageCredentials.privateKey);
} catch (err) {
  parsedPrivateKey = googleStorageCredentials.privateKey;
}

const storageClient = new Storage({
  projectId: googleCredentials.projectId,
  credentials: {
    client_email: googleStorageCredentials.clientEmail,
    private_key: parsedPrivateKey,
  },
});

@clintonb
Copy link

Two-plus years later, this remains a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants