Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Provisioned Throughput for base model gemini-1.5-pro-002 through v1.9.2 #483

Open
ericsen-pdai opened this issue Dec 18, 2024 · 0 comments
Labels
api: aiplatform Issues related to the googleapis/nodejs-vertexai API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@ericsen-pdai
Copy link

I'm calling gemini-1.5-pro-002 base model through this library and after purchasing Provisioned Throughput, all my calls to Gemini are still using shared resources instead of a dedicated one.

The code is pretty simple and straightforward:

    const vertex_ai = new VertexAI({
      project: projectId.value(),
      location: location.value(),
    });
    
    const req: GenerateContentRequest = {
      contents: [{ role: 'user', parts: [fullPrompt] }],
    };

    const generativeModel = vertex_ai.preview.getGenerativeModel({
      model: 'gemini-1.5-pro-002',
      safetySettings: [ ... ],
    });

    const streamingResp = await generativeModel.generateContentStream(req);
    const response = await streamingResp.response;

According to PT (Provisioned Throughput) documentation, I would not need to change my code to get the default behavior (PT and overages on pay-as-you-go basis).
PT doc: https://cloud.google.com/vertex-ai/generative-ai/docs/provisioned-throughput

However, by checking the metrics in Google Console, all my requests are still using shared resources.
Metric: aiplatform.googleapis.com/publisher/online_serving/consumed_throughput
request_type is always shared

I haven't created an endpoint on my project because I want to use the base model gemini-1.5-pro-002. I tried to create an endpoint in my project for the base model but it's not possible neither on the console nor using gcloud commands.

Is it something wrong with my server setup? The PT doc mention it can be used with Base models.
Do I need to force PT somehow on this library?

@ericsen-pdai ericsen-pdai added priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue. labels Dec 18, 2024
@product-auto-label product-auto-label bot added the api: aiplatform Issues related to the googleapis/nodejs-vertexai API. label Dec 18, 2024
@ericsen-pdai ericsen-pdai changed the title Using Provisioned Throughput for base model gemini-1.5-002 through v1.9.2 Using Provisioned Throughput for base model gemini-1.5-pro-002 through v1.9.2 Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the googleapis/nodejs-vertexai API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

1 participant