Replies: 1 comment 3 replies
-
This config should do exactly what you're describing: load balance between Anthropic's API, Amazon Bedrock, and Google Vertex AI. litellm_settings:
num_retries_per_request: 3
model_list:
- model_name: claude-3-haiku-20240307
litellm_params:
model: bedrock/anthropic.claude-3-haiku-20240307-v1:0
aws_region_name: us-east-1
- model_name: claude-3-haiku-20240307
litellm_params:
model: vertex_ai/claude-3-haiku@20240307
vertex_project: litellm-epic
vertex_location: us-central1
- model_name: claude-3-haiku-20240307
litellm_params:
model: anthropic/claude-3-haiku-20240307 For Gemini, see my existing config here: https://gist.github.com/Manouchehri/95ccd1fa2ba2f56d03deee29dc64a915 |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This isn't clear in the docs but I'd expect it to work in the same way between OpenAI and Azure OpenAI hosted models, where a load balancing setup can be utilised. It would improve the reliability of any downstream system from LiteLLM massively, for Anthropic models.
Also related to this, given that Google Gemini models are hosted in GCP per region, I wonder if it's possible to implement a feature that utilises multi-zone load balancing much like how Azure OpenAI support works in LiteLLM?
(In short, a big fan of LiteLLM's work on OpenAI + Azure OpenAI with load balancing; would love to see it be broadened to other models)
Beta Was this translation helpful? Give feedback.
All reactions