Replies: 5 comments 4 replies
-
Hey @olad32 you should be able to do this. I believe you'd need to change the db connection pool limit for a single instance, to do this well (we currently set it to 100, which could be the max for some systems). We have a db table for the LLM config that we were planning on using for this. A problem I was trying to figure out was:
|
Beta Was this translation helpful? Give feedback.
-
If you have time today, would love to do a quick call and talk through this: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat |
Beta Was this translation helpful? Give feedback.
-
@olad32 just pushed a fix to let you control db connection pool + timeouts for better scalability - https://docs.litellm.ai/docs/proxy/configs#configure-db-pool-limits--connection-timeouts Should be out in the next release. Would love to do a quick call and talk through the reload config file issue - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat Let me know if any time this / next week works! |
Beta Was this translation helpful? Give feedback.
-
Thanks for the configurable pool options. For now another option exists via the rolling update kuberntes strategy which can recreate each pod (ie litellm proxy instance) one by one with the new static config.yaml, but it generates a bit of noise on the cluster (recreate each pod), not ideal but manageable. One thing is mandatory for this option to work, litellm proxy must handle gracefull shutdown, ie handle sigterm signal sent by kubernetes and wait for current requests to end before effectively shutting down, especially usefull for long streaming response. In fact gracefulll shutdown is always a good thing to handle. |
Beta Was this translation helpful? Give feedback.
-
Hi @olad32 just wanted to follow up
|
Beta Was this translation helpful? Give feedback.
-
Hi, is this possible to run multiple instances in parallel to be able to scale horizontaly ? Even with the database features in use ?
Is there anything to know to be able to rollout a new LiteLLM config on multiple instances without downtime ? Eg update model config
Thanks
Beta Was this translation helpful? Give feedback.
All reactions