[Feature]: Add a new routing strategy based on RouteLLM (new) OSS project #4753

olad32 · 2024-07-16T07:51:29Z

olad32
Jul 16, 2024

The Feature

Add a new routing strategy based on RouteLLM new project.
RouteLLM https://github.com/lm-sys/RouteLLM allows to route request on a low end (eg Mixtral) or high end LLM (eg GPT-4) based on the user prompt (prompt classification is done by a lightweight model https://huggingface.co/routellm/mf_gpt4_augmented)
This mf_gpt4_augmented model is trained on this particular model pair (Miwtral, GPT-4), but they said they have good generalization performance for other pair too, without retraining the model.

Motivation, pitch

This looks still very experimental, as it have to be evaluated in a RAG or function calling case, but looks promising for cost reduction.
As a side note they use LiteLLM internally but i think a better way would be in fact to do the opposite, integrate RouteLLM as a router in LiteLLM.

Twitter / LinkedIn details

No response

krrishdholakia · 2024-07-16T14:04:41Z

krrishdholakia
Jul 16, 2024
Maintainer

hey @olad32 my concern with this is that a requiring a proxy to run with a model would require deploying on a GPU.

Is this not an approach you can already try with the custom routing strategy implementation - https://docs.litellm.ai/docs/routing#advanced---routing-strategies-%EF%B8%8F

from litellm.router import CustomRoutingStrategyBase
class CustomRoutingStrategy(CustomRoutingStrategyBase):
    async def async_get_available_deployment(
        self,
        model: str,
        messages: Optional[List[Dict[str, str]]] = None,
        input: Optional[Union[str, List]] = None,
        specific_deployment: Optional[bool] = False,
        request_kwargs: Optional[Dict] = None,
    ):
        """
        Asynchronously retrieves the available deployment based on the given parameters.

        Args:
            model (str): The name of the model.
            messages (Optional[List[Dict[str, str]]], optional): The list of messages for a given request. Defaults to None.
            input (Optional[Union[str, List]], optional): The input for a given embedding request. Defaults to None.
            specific_deployment (Optional[bool], optional): Whether to retrieve a specific deployment. Defaults to False.
            request_kwargs (Optional[Dict], optional): Additional request keyword arguments. Defaults to None.

        Returns:
            Returns an element from litellm.router.model_list

        """
        print("In CUSTOM async get available deployment")
        model_list = router.model_list
        print("router model list=", model_list)
        for model in model_list:
            if isinstance(model, dict):
                if model["litellm_params"]["model"] == "openai/very-special-endpoint":
                    return model
        pass

    def get_available_deployment(
        self,
        model: str,
        messages: Optional[List[Dict[str, str]]] = None,
        input: Optional[Union[str, List]] = None,
        specific_deployment: Optional[bool] = False,
        request_kwargs: Optional[Dict] = None,
    ):
        """
        Synchronously retrieves the available deployment based on the given parameters.

        Args:
            model (str): The name of the model.
            messages (Optional[List[Dict[str, str]]], optional): The list of messages for a given request. Defaults to None.
            input (Optional[Union[str, List]], optional): The input for a given embedding request. Defaults to None.
            specific_deployment (Optional[bool], optional): Whether to retrieve a specific deployment. Defaults to False.
            request_kwargs (Optional[Dict], optional): Additional request keyword arguments. Defaults to None.

        Returns:
            Returns an element from litellm.router.model_list

        """
        pass

2 replies

olad32 Jul 25, 2024
Author

hey @olad32 my concern with this is that a requiring a proxy to run with a model would require deploying on a GPU.

Is this not an approach you can already try with the custom routing strategy implementation - https://docs.litellm.ai/docs/routing#advanced---routing-strategies-%EF%B8%8F

from litellm.router import CustomRoutingStrategyBase
class CustomRoutingStrategy(CustomRoutingStrategyBase):
    async def async_get_available_deployment(
        self,
        model: str,
        messages: Optional[List[Dict[str, str]]] = None,
        input: Optional[Union[str, List]] = None,
        specific_deployment: Optional[bool] = False,
        request_kwargs: Optional[Dict] = None,
    ):
        """
        Asynchronously retrieves the available deployment based on the given parameters.

        Args:
            model (str): The name of the model.
            messages (Optional[List[Dict[str, str]]], optional): The list of messages for a given request. Defaults to None.
            input (Optional[Union[str, List]], optional): The input for a given embedding request. Defaults to None.
            specific_deployment (Optional[bool], optional): Whether to retrieve a specific deployment. Defaults to False.
            request_kwargs (Optional[Dict], optional): Additional request keyword arguments. Defaults to None.

        Returns:
            Returns an element from litellm.router.model_list

        """
        print("In CUSTOM async get available deployment")
        model_list = router.model_list
        print("router model list=", model_list)
        for model in model_list:
            if isinstance(model, dict):
                if model["litellm_params"]["model"] == "openai/very-special-endpoint":
                    return model
        pass

    def get_available_deployment(
        self,
        model: str,
        messages: Optional[List[Dict[str, str]]] = None,
        input: Optional[Union[str, List]] = None,
        specific_deployment: Optional[bool] = False,
        request_kwargs: Optional[Dict] = None,
    ):
        """
        Synchronously retrieves the available deployment based on the given parameters.

        Args:
            model (str): The name of the model.
            messages (Optional[List[Dict[str, str]]], optional): The list of messages for a given request. Defaults to None.
            input (Optional[Union[str, List]], optional): The input for a given embedding request. Defaults to None.
            specific_deployment (Optional[bool], optional): Whether to retrieve a specific deployment. Defaults to False.
            request_kwargs (Optional[Dict], optional): Additional request keyword arguments. Defaults to None.

        Returns:
            Returns an element from litellm.router.model_list

        """
        pass

Effectively it seems doable with custom routing, but it would be nice to have it natively included in litellm. The model used for prompt classification is very light (https://huggingface.co/routellm/mf_gpt4_augmented), 200k param, and can be run on CPU (cf https://github.com/lm-sys/RouteLLM/blob/88b8dec894809e2a3ed77096623910eb9f861563/routellm/routers/routers.py#L224). This needs a proper performance test but i think this can have very little impact on response time.
But again there is a warning with the fact that routellm use LiteLLM internally and maybe this can cause weird effect if we use it inside LiteLLM proxy, i let you evaluate this concern.

NEWbie0709 Aug 12, 2024

can i know how you use routellm inside litellm proxy?

nileshtrivedi · 2024-07-21T09:46:50Z

nileshtrivedi
Jul 21, 2024

+1 for implementing smart routing. This is becoming a must-have in AI applications - especially those which serve many different kind of use-cases.

0 replies

haim-barad · 2024-08-15T14:46:47Z

haim-barad
Aug 15, 2024

This feature seems to be a no-brainer and I do agree with @olad32 that this is best done by integrating the smart routing (routellm or similar style) inside of litellm. I'm very interested in a RAG approach where both strong and weak models can draw off the same database (vector, graph, etc). Would this still be best integrated into litellm, or should we use routellm and strong or weak models call different paths (via litellm).

@krrishdholakia - what do you think? Is a RAG approach (based on same data source for the retrieval) best done as a wrapper or integrated into litellm?

4 replies

krrishdholakia Aug 15, 2024
Maintainer

we had a PR for something similar recently - #4971

I don't have a strong POV on what a 'good' auto-routing strategy is, but i'm happy to support integrations to services that do this?

I'd imagine this is just an api call to a selector (similar to how a guardrail check, etc.) to return the model name before a call is made.

Would that work? @olad32 @haim-barad

haim-barad Aug 15, 2024

Interesting - I will have to look.

I noticed the PR is still opened. Does it need revisions? Is there an estimate on when it might be merged?

krrishdholakia Aug 15, 2024
Maintainer

Hoping to have it merged in by end of this week @haim-barad

nileshtrivedi Aug 16, 2024

@krrishdholakia FWIW, I would prefer in-built auto-routing rather than integration with external vendors. IMHO, proxying and routing belong together, and if I have to use a routing vendor, I'd rather have them handle proxying as well. Not handling smart-routing seems self-destructive for LiteLLM, TBH.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add a new routing strategy based on RouteLLM (new) OSS project #4753

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Feature]: Add a new routing strategy based on RouteLLM (new) OSS project #4753

olad32 Jul 16, 2024

The Feature

Motivation, pitch

Twitter / LinkedIn details

Replies: 3 comments · 6 replies

krrishdholakia Jul 16, 2024 Maintainer

olad32 Jul 25, 2024 Author

NEWbie0709 Aug 12, 2024

nileshtrivedi Jul 21, 2024

haim-barad Aug 15, 2024

krrishdholakia Aug 15, 2024 Maintainer

haim-barad Aug 15, 2024

krrishdholakia Aug 15, 2024 Maintainer

nileshtrivedi Aug 16, 2024

olad32
Jul 16, 2024

Replies: 3 comments 6 replies

krrishdholakia
Jul 16, 2024
Maintainer

olad32 Jul 25, 2024
Author

nileshtrivedi
Jul 21, 2024

haim-barad
Aug 15, 2024

krrishdholakia Aug 15, 2024
Maintainer

krrishdholakia Aug 15, 2024
Maintainer