Using Provisioned Throughput for base model gemini-1.5-pro-002 through v1.9.2 #483
Labels
api: aiplatform
Issues related to the googleapis/nodejs-vertexai API.
priority: p3
Desirable enhancement or fix. May not be included in next release.
type: question
Request for information or clarification. Not an issue.
I'm calling gemini-1.5-pro-002 base model through this library and after purchasing Provisioned Throughput, all my calls to Gemini are still using shared resources instead of a dedicated one.
The code is pretty simple and straightforward:
According to PT (Provisioned Throughput) documentation, I would not need to change my code to get the default behavior (PT and overages on pay-as-you-go basis).
PT doc: https://cloud.google.com/vertex-ai/generative-ai/docs/provisioned-throughput
However, by checking the metrics in Google Console, all my requests are still using shared resources.
Metric: aiplatform.googleapis.com/publisher/online_serving/consumed_throughput
request_type is always shared
I haven't created an endpoint on my project because I want to use the base model gemini-1.5-pro-002. I tried to create an endpoint in my project for the base model but it's not possible neither on the console nor using gcloud commands.
Is it something wrong with my server setup? The PT doc mention it can be used with Base models.
Do I need to force PT somehow on this library?
The text was updated successfully, but these errors were encountered: