You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calls to predict methods on VertexAI's custom deployed models prefer to use the VertexAI tokens configured on the GOOGLE_APPLICATION_CREDENTIALS environment variable instead of the token file uploaded while creating the models.
Meanwhile, VertexAI models configured to use OpenAI like completion endpoints are able to use the tokens uploaded during model creation on proxy UI and produces responses as expected.
We have a situation in which a specific model from different VertexAI project instead of the default project has to be called through custom predict call. So it will be helpful to get the VertexAI custom deployed models to use the token file uploaded upon model creation during the predict callls.
Relevant log output
{"message": "Trying to fallback b/w models", "level": "INFO", "timestamp": "2025-02-17T19:04:24.835020"}
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 85, in __await__
response = yield from self._call.__await__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/grpc/aio/_call.py", line 327, in __await__
raise _create_rpc_error(
...<2 lines>...
)
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*******/locations/us-central1/endpoints/1984786713414729728' (or it may not exist)."
debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:"2025-02-17T19:04:22.270203099+00:00", grpc_status:7, grpc_message:"Permission \'aiplatform.endpoints.predict\' denied on resource \'//aiplatform.googleapis.com/projects/******/locations/us-central1/endpoints/1984786713414729728\' (or it may not exist)."}">
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/main.py", line 466, in acompletion
response = await init_response
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py", line 738, in async_streaming
response_obj = await llm_model.predict(
^^^^^^^^^^^^^^^^^^^^^^^^
...<2 lines>...
)
^
File "/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py", line 404, in predict
response = await rpc(
^^^^^^^^^^
...<4 lines>...
)
^
File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 88, in __await__
raise exceptions.from_grpc_error(rpc_error) from rpc_error
google.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
key: "resource"
value: "projects/*********/locations/us-central1/endpoints/1984786713414729728"
}
metadata {
key: "permission"
value: "aiplatform.endpoints.predict"
}
]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 2889, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3262, in async_function_with_retries
raise original_exception
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3155, in async_function_with_retries
response = await self.make_call(original_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3271, in make_call
response = await response
^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1042, in _acompletion
raise e
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1001, in _acompletion
response = await _response
^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1394, in wrapper_async
raise e
File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1253, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/main.py", line 485, in acompletion
raise exception_type(
~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<3 lines>...
extra_kwargs=kwargs,
^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2202, in exception_type
raise e
File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2178, in exception_type
raise APIConnectionError(
...<8 lines>...
)
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
key: "resource"
value: "projects/*********/locations/us-central1/endpoints/1984786713414729728"
}
metadata {
key: "permission"
value: "aiplatform.endpoints.predict"
}
]
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 85, in __await__
response = yield from self._call.__await__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/grpc/aio/_call.py", line 327, in __await__
raise _create_rpc_error(
...<2 lines>...
)
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist)."
debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:"2025-02-17T19:04:22.270203099+00:00", grpc_status:7, grpc_message:"Permission \'aiplatform.endpoints.predict\' denied on resource \'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\' (or it may not exist)."}">
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/main.py", line 466, in acompletion
response = await init_response
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py", line 738, in async_streaming
response_obj = await llm_model.predict(
^^^^^^^^^^^^^^^^^^^^^^^^
...<2 lines>...
)
^
File "/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py", line 404, in predict
response = await rpc(
^^^^^^^^^^
...<4 lines>...
)
^
File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 88, in __await__
raise exceptions.from_grpc_error(rpc_error) from rpc_error
google.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
key: "resource"
value: "projects/*********/locations/us-central1/endpoints/1984786713414729728"
}
metadata {
key: "permission"
value: "aiplatform.endpoints.predict"
}
]
LiteLLM Retried: 1 times, LiteLLM Max Retries: 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3010, in async_function_with_fallbacks
get_fallback_model_group(
~~~~~~~~~~~~~~~~~~~~~~~~^
fallbacks=fallbacks, # if fallbacks = [{"gpt-3.5-turbo": ["claude-3-haiku"]}]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
model_group=cast(str, model_group),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/litellm/router_utils/fallback_event_handlers.py", line 61, in get_fallback_model_group
iflist(item.keys())[0] == model_group: # check exact match~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
{"message": "litellm.router.py::async_function_with_fallbacks() - Error occurred while trying to do fallbacks - list index out of range\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 2889, in async_function_with_fallbacks\n response = await self.async_function_with_retries(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3262, in async_function_with_retries\n raise original_exception\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3155, in async_function_with_retries\n response = await self.make_call(original_function, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3271, in make_call\n response = await response\n ^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1042, in _acompletion\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1001, in _acompletion\n response = await _response\n ^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1394, in wrapper_async\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1253, in wrapper_async\n result = await original_function(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 485, in acompletion\n raise exception_type(\n ~~~~~~~~~~~~~~^\n model=model,\n ^^^^^^^^^^^^\n ...<3 lines>...\n extra_kwargs=kwargs,\n ^^^^^^^^^^^^^^^^^^^^\n )\n ^\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2202, in exception_type\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2178, in exception_type\n raise APIConnectionError(\n ...<8 lines>...\n )\nlitellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n LiteLLM Retried: 1 times, LiteLLM Max Retries: 2\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3010, in async_function_with_fallbacks\n get_fallback_model_group(\n ~~~~~~~~~~~~~~~~~~~~~~~~^\n fallbacks=fallbacks, # if fallbacks = [{\"gpt-3.5-turbo\": [\"claude-3-haiku\"]}]\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n model_group=cast(str, model_group),\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n )\n ^\n File \"/usr/lib/python3.13/site-packages/litellm/router_utils/fallback_event_handlers.py\", line 61, in get_fallback_model_group\n if list(item.keys())[0] == model_group: # check exact match\n ~~~~~~~~~~~~~~~~~^^^\nIndexError: list index out of range\n\n\nDebug Information:\nCooldown Deployments=[]", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.849290"}
{"message": "litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nReceived Model Group=pco-llama3-1-8b-ft-icd-l4-predict\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range LiteLLM Retried: 1 times, LiteLLM Max Retries: 2", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.855748", "stacktrace": "Traceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/proxy/proxy_server.py\", line 3587, in chat_completion\n responses = await llm_responses\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 904, in acompletion\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 880, in acompletion\n response = await self.async_function_with_fallbacks(**kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3071, in async_function_with_fallbacks\n raise original_exception\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 2889, in async_function_with_fallbacks\n response = await self.async_function_with_retries(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3262, in async_function_with_retries\n raise original_exception\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3155, in async_function_with_retries\n response = await self.make_call(original_function, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3271, in make_call\n response = await response\n ^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1042, in _acompletion\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1001, in _acompletion\n response = await _response\n ^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1394, in wrapper_async\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1253, in wrapper_async\n result = await original_function(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 485, in acompletion\n raise exception_type(\n ~~~~~~~~~~~~~~^\n model=model,\n ^^^^^^^^^^^^\n ...<3 lines>...\n extra_kwargs=kwargs,\n ^^^^^^^^^^^^^^^^^^^^\n )\n ^\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2202, in exception_type\n raise e\n File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2178, in exception_type\n raise APIConnectionError(\n ...<8 lines>...\n )\nlitellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n response = yield from self._call.__await__()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n raise _create_rpc_error(\n ...<2 lines>...\n )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n response = await init_response\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n response_obj = await llm_model.predict(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n ...<2 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n response = await rpc(\n ^^^^^^^^^^\n ...<4 lines>...\n )\n ^\n File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n key: \"permission\"\n value: \"aiplatform.endpoints.predict\"\n}\n]\n\nReceived Model Group=pco-llama3-1-8b-ft-icd-l4-predict\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range LiteLLM Retried: 1 times, LiteLLM Max Retries: 2"}
{"message": "{\"event\": \"giveup\", \"exception\": \"\"}", "level": "INFO", "timestamp": "2025-02-17T19:04:24.862449"}
{"message": "Giving up chat_completion(...) after 1 tries (litellm.proxy._types.ProxyException)", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.867654"}
{"message": "litellm.acompletion(model=azure/mlp-genai-npe-eastus2-gpt4o)\u001b[32m 200 OK\u001b[0m", "level": "INFO", "timestamp": "2025-02-17T19:04:27.345561"}
{"message": "disable_spend_logs=True. Skipping writing spend logs to db. Other spend updates - Key/User/Team table will still occur.", "level": "INFO", "timestamp": "2025-02-17T19:04:27.346675"}
Are you a ML Ops Team?
Yes
What LiteLLM version are you on ?
v1.61.3
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered:
What happened?
Calls to predict methods on VertexAI's custom deployed models prefer to use the VertexAI tokens configured on the GOOGLE_APPLICATION_CREDENTIALS environment variable instead of the token file uploaded while creating the models.
Meanwhile, VertexAI models configured to use OpenAI like completion endpoints are able to use the tokens uploaded during model creation on proxy UI and produces responses as expected.
We have a situation in which a specific model from different VertexAI project instead of the default project has to be called through custom predict call. So it will be helpful to get the VertexAI custom deployed models to use the token file uploaded upon model creation during the predict callls.
Relevant log output
Are you a ML Ops Team?
Yes
What LiteLLM version are you on ?
v1.61.3
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: