Replies: 3 comments 6 replies
-
hey @olad32 my concern with this is that a requiring a proxy to run with a model would require deploying on a GPU. Is this not an approach you can already try with the custom routing strategy implementation - https://docs.litellm.ai/docs/routing#advanced---routing-strategies-%EF%B8%8F from litellm.router import CustomRoutingStrategyBase
class CustomRoutingStrategy(CustomRoutingStrategyBase):
async def async_get_available_deployment(
self,
model: str,
messages: Optional[List[Dict[str, str]]] = None,
input: Optional[Union[str, List]] = None,
specific_deployment: Optional[bool] = False,
request_kwargs: Optional[Dict] = None,
):
"""
Asynchronously retrieves the available deployment based on the given parameters.
Args:
model (str): The name of the model.
messages (Optional[List[Dict[str, str]]], optional): The list of messages for a given request. Defaults to None.
input (Optional[Union[str, List]], optional): The input for a given embedding request. Defaults to None.
specific_deployment (Optional[bool], optional): Whether to retrieve a specific deployment. Defaults to False.
request_kwargs (Optional[Dict], optional): Additional request keyword arguments. Defaults to None.
Returns:
Returns an element from litellm.router.model_list
"""
print("In CUSTOM async get available deployment")
model_list = router.model_list
print("router model list=", model_list)
for model in model_list:
if isinstance(model, dict):
if model["litellm_params"]["model"] == "openai/very-special-endpoint":
return model
pass
def get_available_deployment(
self,
model: str,
messages: Optional[List[Dict[str, str]]] = None,
input: Optional[Union[str, List]] = None,
specific_deployment: Optional[bool] = False,
request_kwargs: Optional[Dict] = None,
):
"""
Synchronously retrieves the available deployment based on the given parameters.
Args:
model (str): The name of the model.
messages (Optional[List[Dict[str, str]]], optional): The list of messages for a given request. Defaults to None.
input (Optional[Union[str, List]], optional): The input for a given embedding request. Defaults to None.
specific_deployment (Optional[bool], optional): Whether to retrieve a specific deployment. Defaults to False.
request_kwargs (Optional[Dict], optional): Additional request keyword arguments. Defaults to None.
Returns:
Returns an element from litellm.router.model_list
"""
pass |
Beta Was this translation helpful? Give feedback.
-
+1 for implementing smart routing. This is becoming a must-have in AI applications - especially those which serve many different kind of use-cases. |
Beta Was this translation helpful? Give feedback.
-
This feature seems to be a no-brainer and I do agree with @olad32 that this is best done by integrating the smart routing (routellm or similar style) inside of litellm. I'm very interested in a RAG approach where both strong and weak models can draw off the same database (vector, graph, etc). Would this still be best integrated into litellm, or should we use routellm and strong or weak models call different paths (via litellm). @krrishdholakia - what do you think? Is a RAG approach (based on same data source for the retrieval) best done as a wrapper or integrated into litellm? |
Beta Was this translation helpful? Give feedback.
-
The Feature
Add a new routing strategy based on RouteLLM new project.
RouteLLM https://github.com/lm-sys/RouteLLM allows to route request on a low end (eg Mixtral) or high end LLM (eg GPT-4) based on the user prompt (prompt classification is done by a lightweight model https://huggingface.co/routellm/mf_gpt4_augmented)
This mf_gpt4_augmented model is trained on this particular model pair (Miwtral, GPT-4), but they said they have good generalization performance for other pair too, without retraining the model.
Motivation, pitch
This looks still very experimental, as it have to be evaluated in a RAG or function calling case, but looks promising for cost reduction.
As a side note they use LiteLLM internally but i think a better way would be in fact to do the opposite, integrate RouteLLM as a router in LiteLLM.
Twitter / LinkedIn details
No response
Beta Was this translation helpful? Give feedback.
All reactions