Load balancing between Anthropic and AWS/GCP hosted Claude models #2998

cwang · 2024-04-13T03:54:01Z

cwang
Apr 13, 2024

This isn't clear in the docs but I'd expect it to work in the same way between OpenAI and Azure OpenAI hosted models, where a load balancing setup can be utilised. It would improve the reliability of any downstream system from LiteLLM massively, for Anthropic models.

Also related to this, given that Google Gemini models are hosted in GCP per region, I wonder if it's possible to implement a feature that utilises multi-zone load balancing much like how Azure OpenAI support works in LiteLLM?

(In short, a big fan of LiteLLM's work on OpenAI + Azure OpenAI with load balancing; would love to see it be broadened to other models)

Manouchehri · 2024-04-16T20:22:35Z

Manouchehri
Apr 16, 2024
Collaborator

This config should do exactly what you're describing: load balance between Anthropic's API, Amazon Bedrock, and Google Vertex AI.

litellm_settings:
  num_retries_per_request: 3

model_list:
  - model_name: claude-3-haiku-20240307
    litellm_params:
      model: bedrock/anthropic.claude-3-haiku-20240307-v1:0
      aws_region_name: us-east-1

  - model_name: claude-3-haiku-20240307
    litellm_params:
      model: vertex_ai/claude-3-haiku@20240307
      vertex_project: litellm-epic
      vertex_location: us-central1

  - model_name: claude-3-haiku-20240307
    litellm_params:
      model: anthropic/claude-3-haiku-20240307

For Gemini, see my existing config here: https://gist.github.com/Manouchehri/95ccd1fa2ba2f56d03deee29dc64a915

3 replies

cwang Apr 16, 2024
Author

Thanks @Manouchehri - just to be sure the two params are vertex_project and vertex_location instead of vertexai_project and vertexai_region?

Manouchehri Apr 16, 2024
Collaborator

The config I gave definitely works, since I copied it from what we have running in production. :)

litellm/litellm/main.py

Lines 1667 to 1684 in 70716b3

    
           elif custom_llm_provider == "vertex_ai": 
        
               vertex_ai_project = ( 
        
                   optional_params.pop("vertex_project", None) 
        
                   or optional_params.pop("vertex_ai_project", None) 
        
                   or litellm.vertex_project 
        
                   or get_secret("VERTEXAI_PROJECT") 
        
               ) 
        
               vertex_ai_location = ( 
        
                   optional_params.pop("vertex_location", None) 
        
                   or optional_params.pop("vertex_ai_location", None) 
        
                   or litellm.vertex_location 
        
                   or get_secret("VERTEXAI_LOCATION") 
        
               ) 
        
               vertex_credentials = ( 
        
                   optional_params.pop("vertex_credentials", None) 
        
                   or optional_params.pop("vertex_ai_credentials", None) 
        
                   or get_secret("VERTEXAI_CREDENTIALS") 
        
               )

cwang Apr 17, 2024
Author

Nice one, thanks very much @Manouchehri

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load balancing between Anthropic and AWS/GCP hosted Claude models #2998

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Load balancing between Anthropic and AWS/GCP hosted Claude models #2998

cwang Apr 13, 2024

Replies: 1 comment · 3 replies

Manouchehri Apr 16, 2024 Collaborator

cwang Apr 16, 2024 Author

Manouchehri Apr 16, 2024 Collaborator

cwang Apr 17, 2024 Author

cwang
Apr 13, 2024

Replies: 1 comment 3 replies

Manouchehri
Apr 16, 2024
Collaborator

cwang Apr 16, 2024
Author

Manouchehri Apr 16, 2024
Collaborator

cwang Apr 17, 2024
Author