[Bug]: VertexAI custom model does not pick up uploaded token #8597

suresiva · 2025-02-17T19:24:55Z

What happened?

Calls to predict methods on VertexAI's custom deployed models prefer to use the VertexAI tokens configured on the GOOGLE_APPLICATION_CREDENTIALS environment variable instead of the token file uploaded while creating the models.

Meanwhile, VertexAI models configured to use OpenAI like completion endpoints are able to use the tokens uploaded during model creation on proxy UI and produces responses as expected.

We have a situation in which a specific model from different VertexAI project instead of the default project has to be called through custom predict call. So it will be helpful to get the VertexAI custom deployed models to use the token file uploaded upon model creation during the predict callls.

Relevant log output

{"message": "Trying to fallback b/w models", "level": "INFO", "timestamp": "2025-02-17T19:04:24.835020"}
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 85, in __await__
    response = yield from self._call.__await__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/grpc/aio/_call.py", line 327, in __await__
    raise _create_rpc_error(
    ...<2 lines>...
    )
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
	status = StatusCode.PERMISSION_DENIED
	details = "Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*******/locations/us-central1/endpoints/1984786713414729728' (or it may not exist)."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:"2025-02-17T19:04:22.270203099+00:00", grpc_status:7, grpc_message:"Permission \'aiplatform.endpoints.predict\' denied on resource \'//aiplatform.googleapis.com/projects/******/locations/us-central1/endpoints/1984786713414729728\' (or it may not exist)."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/main.py", line 466, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py", line 738, in async_streaming
    response_obj = await llm_model.predict(
                   ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py", line 404, in predict
    response = await rpc(
               ^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 88, in __await__
    raise exceptions.from_grpc_error(rpc_error) from rpc_error
google.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
  key: "resource"
  value: "projects/*********/locations/us-central1/endpoints/1984786713414729728"
}
metadata {
  key: "permission"
  value: "aiplatform.endpoints.predict"
}
]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 2889, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3262, in async_function_with_retries
    raise original_exception
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3155, in async_function_with_retries
    response = await self.make_call(original_function, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3271, in make_call
    response = await response
               ^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1042, in _acompletion
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 1001, in _acompletion
    response = await _response
               ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1394, in wrapper_async
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/utils.py", line 1253, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/main.py", line 485, in acompletion
    raise exception_type(
          ~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<3 lines>...
        extra_kwargs=kwargs,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2202, in exception_type
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2178, in exception_type
    raise APIConnectionError(
    ...<8 lines>...
    )
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
  key: "resource"
  value: "projects/*********/locations/us-central1/endpoints/1984786713414729728"
}
metadata {
  key: "permission"
  value: "aiplatform.endpoints.predict"
}
]
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 85, in __await__
    response = yield from self._call.__await__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/grpc/aio/_call.py", line 327, in __await__
    raise _create_rpc_error(
    ...<2 lines>...
    )
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
	status = StatusCode.PERMISSION_DENIED
	details = "Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist)."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:"2025-02-17T19:04:22.270203099+00:00", grpc_status:7, grpc_message:"Permission \'aiplatform.endpoints.predict\' denied on resource \'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\' (or it may not exist)."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/main.py", line 466, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py", line 738, in async_streaming
    response_obj = await llm_model.predict(
                   ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py", line 404, in predict
    response = await rpc(
               ^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py", line 88, in __await__
    raise exceptions.from_grpc_error(rpc_error) from rpc_error
google.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
  key: "resource"
  value: "projects/*********/locations/us-central1/endpoints/1984786713414729728"
}
metadata {
  key: "permission"
  value: "aiplatform.endpoints.predict"
}
]
 LiteLLM Retried: 1 times, LiteLLM Max Retries: 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3010, in async_function_with_fallbacks
    get_fallback_model_group(
    ~~~~~~~~~~~~~~~~~~~~~~~~^
        fallbacks=fallbacks,  # if fallbacks = [{"gpt-3.5-turbo": ["claude-3-haiku"]}]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        model_group=cast(str, model_group),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/litellm/router_utils/fallback_event_handlers.py", line 61, in get_fallback_model_group
    if list(item.keys())[0] == model_group:  # check exact match
       ~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
{"message": "litellm.router.py::async_function_with_fallbacks() - Error occurred while trying to do fallbacks - list index out of range\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n    response = yield from self._call.__await__()\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n    raise _create_rpc_error(\n    ...<2 lines>...\n    )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n    response = await init_response\n               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n    response_obj = await llm_model.predict(\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n    ...<2 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n    response = await rpc(\n               ^^^^^^^^^^\n    ...<4 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n    raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 2889, in async_function_with_fallbacks\n    response = await self.async_function_with_retries(*args, **kwargs)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3262, in async_function_with_retries\n    raise original_exception\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3155, in async_function_with_retries\n    response = await self.make_call(original_function, *args, **kwargs)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3271, in make_call\n    response = await response\n               ^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1042, in _acompletion\n    raise e\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1001, in _acompletion\n    response = await _response\n               ^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1394, in wrapper_async\n    raise e\n  File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1253, in wrapper_async\n    result = await original_function(*args, **kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 485, in acompletion\n    raise exception_type(\n          ~~~~~~~~~~~~~~^\n        model=model,\n        ^^^^^^^^^^^^\n    ...<3 lines>...\n        extra_kwargs=kwargs,\n        ^^^^^^^^^^^^^^^^^^^^\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2202, in exception_type\n    raise e\n  File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2178, in exception_type\n    raise APIConnectionError(\n    ...<8 lines>...\n    )\nlitellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n    response = yield from self._call.__await__()\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n    raise _create_rpc_error(\n    ...<2 lines>...\n    )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n    response = await init_response\n               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n    response_obj = await llm_model.predict(\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n    ...<2 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n    response = await rpc(\n               ^^^^^^^^^^\n    ...<4 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n    raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\n LiteLLM Retried: 1 times, LiteLLM Max Retries: 2\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3010, in async_function_with_fallbacks\n    get_fallback_model_group(\n    ~~~~~~~~~~~~~~~~~~~~~~~~^\n        fallbacks=fallbacks,  # if fallbacks = [{\"gpt-3.5-turbo\": [\"claude-3-haiku\"]}]\n        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n        model_group=cast(str, model_group),\n        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/litellm/router_utils/fallback_event_handlers.py\", line 61, in get_fallback_model_group\n    if list(item.keys())[0] == model_group:  # check exact match\n       ~~~~~~~~~~~~~~~~~^^^\nIndexError: list index out of range\n\n\nDebug Information:\nCooldown Deployments=[]", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.849290"}
{"message": "litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n    response = yield from self._call.__await__()\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n    raise _create_rpc_error(\n    ...<2 lines>...\n    )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n    response = await init_response\n               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n    response_obj = await llm_model.predict(\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n    ...<2 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n    response = await rpc(\n               ^^^^^^^^^^\n    ...<4 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n    raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\n\nReceived Model Group=pco-llama3-1-8b-ft-icd-l4-predict\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range LiteLLM Retried: 1 times, LiteLLM Max Retries: 2", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.855748", "stacktrace": "Traceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n    response = yield from self._call.__await__()\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n    raise _create_rpc_error(\n    ...<2 lines>...\n    )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n    response = await init_response\n               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n    response_obj = await llm_model.predict(\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n    ...<2 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n    response = await rpc(\n               ^^^^^^^^^^\n    ...<4 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n    raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/proxy/proxy_server.py\", line 3587, in chat_completion\n    responses = await llm_responses\n                ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 904, in acompletion\n    raise e\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 880, in acompletion\n    response = await self.async_function_with_fallbacks(**kwargs)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3071, in async_function_with_fallbacks\n    raise original_exception\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 2889, in async_function_with_fallbacks\n    response = await self.async_function_with_retries(*args, **kwargs)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3262, in async_function_with_retries\n    raise original_exception\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3155, in async_function_with_retries\n    response = await self.make_call(original_function, *args, **kwargs)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 3271, in make_call\n    response = await response\n               ^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1042, in _acompletion\n    raise e\n  File \"/usr/lib/python3.13/site-packages/litellm/router.py\", line 1001, in _acompletion\n    response = await _response\n               ^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1394, in wrapper_async\n    raise e\n  File \"/usr/lib/python3.13/site-packages/litellm/utils.py\", line 1253, in wrapper_async\n    result = await original_function(*args, **kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 485, in acompletion\n    raise exception_type(\n          ~~~~~~~~~~~~~~^\n        model=model,\n        ^^^^^^^^^^^^\n    ...<3 lines>...\n        extra_kwargs=kwargs,\n        ^^^^^^^^^^^^^^^^^^^^\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2202, in exception_type\n    raise e\n  File \"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 2178, in exception_type\n    raise APIConnectionError(\n    ...<8 lines>...\n    )\nlitellm.exceptions.APIConnectionError: litellm.APIConnectionError: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 85, in __await__\n    response = yield from self._call.__await__()\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/grpc/aio/_call.py\", line 327, in __await__\n    raise _create_rpc_error(\n    ...<2 lines>...\n    )\ngrpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.250.191.138:443 {created_time:\"2025-02-17T19:04:22.270203099+00:00\", grpc_status:7, grpc_message:\"Permission \\'aiplatform.endpoints.predict\\' denied on resource \\'//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728\\' (or it may not exist).\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.13/site-packages/litellm/main.py\", line 466, in acompletion\n    response = await init_response\n               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/vertex_ai_non_gemini.py\", line 738, in async_streaming\n    response_obj = await llm_model.predict(\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n    ...<2 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/cloud/aiplatform_v1/services/prediction_service/async_client.py\", line 404, in predict\n    response = await rpc(\n               ^^^^^^^^^^\n    ...<4 lines>...\n    )\n    ^\n  File \"/usr/lib/python3.13/site-packages/google/api_core/grpc_helpers_async.py\", line 88, in __await__\n    raise exceptions.from_grpc_error(rpc_error) from rpc_error\ngoogle.api_core.exceptions.PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/*********/locations/us-central1/endpoints/1984786713414729728' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"aiplatform.googleapis.com\"\nmetadata {\n  key: \"resource\"\n  value: \"projects/*********/locations/us-central1/endpoints/1984786713414729728\"\n}\nmetadata {\n  key: \"permission\"\n  value: \"aiplatform.endpoints.predict\"\n}\n]\n\nReceived Model Group=pco-llama3-1-8b-ft-icd-l4-predict\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range LiteLLM Retried: 1 times, LiteLLM Max Retries: 2"}
{"message": "{\"event\": \"giveup\", \"exception\": \"\"}", "level": "INFO", "timestamp": "2025-02-17T19:04:24.862449"}
{"message": "Giving up chat_completion(...) after 1 tries (litellm.proxy._types.ProxyException)", "level": "ERROR", "timestamp": "2025-02-17T19:04:24.867654"}
{"message": "litellm.acompletion(model=azure/mlp-genai-npe-eastus2-gpt4o)\u001b[32m 200 OK\u001b[0m", "level": "INFO", "timestamp": "2025-02-17T19:04:27.345561"}
{"message": "disable_spend_logs=True. Skipping writing spend logs to db. Other spend updates - Key/User/Team table will still occur.", "level": "INFO", "timestamp": "2025-02-17T19:04:27.346675"}

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.61.3

Twitter / LinkedIn details

No response

krrishdholakia · 2025-02-17T19:45:32Z

Hey @suresiva what are steps to repro this issue? could it be related to - #7904

suresiva · 2025-02-18T15:40:49Z

Hey @krrishdholakia - No, its not related to above issue given. We see the OpenAI like completion calls working fine.

We only have issue with calling the custom model's predict call.

Steps to reproduce this error are,

Spin up LiteLLM proxy with GOOGLE_APPLICATION_CREDENTIALS environment var pointing to VertexAI project A
Configure a VertexAI model on Proxy UI, with a credential json file of VertexAI project B
Above model should be configured to call custom deployment's predict call (model name with vertex_ai/<endpoint_id>)
Submit a OpenAI completion call to this custom model configured on LiteLLM

suresiva added the bug Something isn't working label Feb 17, 2025

github-actions bot added the mlops user request label Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: VertexAI custom model does not pick up uploaded token #8597

[Bug]: VertexAI custom model does not pick up uploaded token #8597

suresiva commented Feb 17, 2025 •

edited

Loading

krrishdholakia commented Feb 17, 2025

suresiva commented Feb 18, 2025 •

edited

Loading

[Bug]: VertexAI custom model does not pick up uploaded token #8597

[Bug]: VertexAI custom model does not pick up uploaded token #8597

Comments

suresiva commented Feb 17, 2025 • edited Loading

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

krrishdholakia commented Feb 17, 2025

suresiva commented Feb 18, 2025 • edited Loading

suresiva commented Feb 17, 2025 •

edited

Loading

suresiva commented Feb 18, 2025 •

edited

Loading