Support retry policy for completion / acompletion #6916

dbczumar · 2024-11-26T11:12:51Z

Title

Support retry policy for completion / acompletion

Relevant issues

Type

🆕 New Feature

Changes

Moves policy-based retry logic from Router into a utils file, which is then called by Router and wrapper / async_wrapper

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

If UI changes, send a screenshot/GIF of working UI fixes

Added unit tests:

Manually verified that retry policies are respected by creating a rate limited Databricks LLM endpoint and sending enough requests to exceed the limit. Confirmed that the # of rate limit retries configured by the retry policy is used.

import litellm
import uuid
from litellm import completion, acompletion, embedding, aembedding, RetryPolicy

retry_policy = RetryPolicy(
    TimeoutErrorRetries=0,
    RateLimitErrorRetries=5,
    APIErrorRetries=0,
    APIConnectionErrorRetries=0,
    backoff_factor=0
)


litellm.set_verbose=True
for i in range(100):
    print(litellm.completion(
        model="databricks/corey-limited",
        messages=[{"role": "user", "content": "Tell me a joke." + uuid.uuid4().hex}],
        retry_policy=retry_policy
    ))

....
 File "/Users/corey.zumar/litellm/litellm/utils.py", line 915, in _wrapper
    result = original_function(*args, **kwargs)
  File "/Users/corey.zumar/litellm/litellm/main.py", line 3061, in completion
    raise exception_type(
  File "/Users/corey.zumar/litellm/litellm/litellm_core_utils/exception_mapping_utils.py", line 2136, in exception_type
    raise e
  File "/Users/corey.zumar/litellm/litellm/litellm_core_utils/exception_mapping_utils.py", line 708, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: databricksException - {"error_code":"REQUEST_LIMIT_EXCEEDED","message":"REQUEST_LIMIT_EXCEEDED: User defined rate limit(s) exceeded for endpoint: corey-limited."} LiteLLM Retried: 4 times, LiteLLM Max Retries: 5

I also repeated this exercise with acompletion:

import asyncio

import litellm
import uuid
from litellm import completion, acompletion, embedding, aembedding, RetryPolicy

retry_policy = RetryPolicy(
    TimeoutErrorRetries=0,
    RateLimitErrorRetries=5,
    APIErrorRetries=0,
    APIConnectionErrorRetries=0,
    backoff_factor=0
)


litellm.set_verbose=True
for i in range(100):
    print(asyncio.run(litellm.acompletion(
        model="databricks/corey-limited",
        messages=[{"role": "user", "content": "Tell me a joke." + uuid.uuid4().hex}],
        retry_policy=retry_policy
    )))

...
  File "/Users/corey.zumar/litellm/litellm/litellm_core_utils/exception_mapping_utils.py", line 2136, in exception_type
    raise e
  File "/Users/corey.zumar/litellm/litellm/litellm_core_utils/exception_mapping_utils.py", line 708, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: databricksException - {"error_code":"REQUEST_LIMIT_EXCEEDED","message":"REQUEST_LIMIT_EXCEEDED: User defined rate limit(s) exceeded for endpoint: corey-limited."} LiteLLM Retried: 4 times, LiteLLM Max Retries: 5

Signed-off-by: dbczumar <[email protected]>

vercel · 2024-11-26T11:12:55Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Dec 5, 2024 9:51am

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-11-26T11:17:41Z

litellm/utils.py


-        try:


The majority of the diff in wrapper below this line is just un-indenting the try / catch block now that retry logic from Router is used instead. You can ignore everything except https://github.com/BerriAI/litellm/pull/6916/files#r1860469861 until https://github.com/BerriAI/litellm/pull/6916/files#r1858292742

dbczumar · 2024-11-26T11:18:28Z

litellm/utils.py

-        except Exception as e:
-            call_type = original_function.__name__
-            if call_type == CallTypes.completion.value:
-                num_retries = (
-                    kwargs.get("num_retries", None) or litellm.num_retries or None
-                )
-                litellm.num_retries = (
-                    None  # set retries to None to prevent infinite loops
-                )
-                context_window_fallback_dict = kwargs.get(
-                    "context_window_fallback_dict", {}
-                )
-
-                _is_litellm_router_call = "model_group" in kwargs.get(
-                    "metadata", {}
-                )  # check if call from litellm.router/proxy
-                if (
-                    num_retries and not _is_litellm_router_call
-                ):  # only enter this if call is not from litellm router/proxy. router has it's own logic for retrying
-                    if (
-                        isinstance(e, openai.APIError)
-                        or isinstance(e, openai.Timeout)
-                        or isinstance(e, openai.APIConnectionError)
-                    ):
-                        kwargs["num_retries"] = num_retries
-                        return litellm.completion_with_retries(*args, **kwargs)
-                elif (
-                    isinstance(e, litellm.exceptions.ContextWindowExceededError)
-                    and context_window_fallback_dict
-                    and model in context_window_fallback_dict
-                    and not _is_litellm_router_call
-                ):
-                    if len(args) > 0:
-                        args[0] = context_window_fallback_dict[model]  # type: ignore
-                    else:
-                        kwargs["model"] = context_window_fallback_dict[model]
-                    return original_function(*args, **kwargs)
-            traceback_exception = traceback.format_exc()
-            end_time = datetime.datetime.now()
-
-            # LOG FAILURE - handle streaming failure logging in the _next_ object, remove `handle_failure` once it's deprecated
-            if logging_obj:
-                logging_obj.failure_handler(
-                    e, traceback_exception, start_time, end_time
-                )  # DO NOT MAKE THREADED - router retry fallback relies on this!
-            raise e


This logic has now be removed, since we're now reusing retry logic from Router

dbczumar · 2024-11-26T11:20:16Z

litellm/utils.py

-        except Exception as e:
-            traceback_exception = traceback.format_exc()
-            end_time = datetime.datetime.now()
-            if logging_obj:
-                try:
-                    logging_obj.failure_handler(
-                        e, traceback_exception, start_time, end_time
-                    )  # DO NOT MAKE THREADED - router retry fallback relies on this!
-                except Exception as e:
-                    raise e
-                try:
-                    await logging_obj.async_failure_handler(
-                        e, traceback_exception, start_time, end_time
-                    )
-                except Exception as e:
-                    raise e
-
-            call_type = original_function.__name__
-            if call_type == CallTypes.acompletion.value:
-                num_retries = (
-                    kwargs.get("num_retries", None) or litellm.num_retries or None
-                )
-                litellm.num_retries = (
-                    None  # set retries to None to prevent infinite loops
-                )
-                context_window_fallback_dict = kwargs.get(
-                    "context_window_fallback_dict", {}
-                )
-
-                _is_litellm_router_call = "model_group" in kwargs.get(
-                    "metadata", {}
-                )  # check if call from litellm.router/proxy
-                if (
-                    num_retries and not _is_litellm_router_call
-                ):  # only enter this if call is not from litellm router/proxy. router has it's own logic for retrying
-                    try:
-                        kwargs["num_retries"] = num_retries
-                        kwargs["original_function"] = original_function
-                        if isinstance(
-                            e, openai.RateLimitError
-                        ):  # rate limiting specific error
-                            kwargs["retry_strategy"] = "exponential_backoff_retry"
-                        elif isinstance(e, openai.APIError):  # generic api error
-                            kwargs["retry_strategy"] = "constant_retry"
-                        return await litellm.acompletion_with_retries(*args, **kwargs)
-                    except Exception:
-                        pass
-                elif (
-                    isinstance(e, litellm.exceptions.ContextWindowExceededError)
-                    and context_window_fallback_dict
-                    and model in context_window_fallback_dict
-                ):
-                    if len(args) > 0:
-                        args[0] = context_window_fallback_dict[model]  # type: ignore
-                    else:
-                        kwargs["model"] = context_window_fallback_dict[model]
-                    return await original_function(*args, **kwargs)
-            raise e


This logic has now be removed, since we're now reusing retry logic from Router

dbczumar · 2024-11-26T11:21:24Z

litellm/utils.py

-            ):
-                raise ValueError("model param not passed in.")
-
-        try:


The majority of the diff in wrapper below this line is just un-indenting the try / catch block now that retry logic from Router is used instead. You can ignore everything except https://github.com/BerriAI/litellm/pull/6916/files#r1860468305 until https://github.com/BerriAI/litellm/pull/6916/files#r1858295149

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-11-26T11:23:41Z

litellm/main.py

@@ -372,6 +372,8 @@ async def acompletion(
        LITELLM Specific Params
        mock_response (str, optional): If provided, return a mock completion response for testing or debugging purposes (default is None).
        custom_llm_provider (str, optional): Used for Non-OpenAI LLMs, Example usage for bedrock, set model="amazon.titan-tg1-large" and custom_llm_provider="bedrock"
+        max_retries (int, optional): The number of retries to attempt (default is 0).


acompletion supports max_retries already today. it just wasn't documented here

dbczumar · 2024-11-26T11:28:21Z

litellm/router.py


        verbose_router_logger.debug(
            f"async function w/ retries: original_function - {original_function}, num_retries - {num_retries}"
        )
-        try:


All of the deleted code below was moved into a utils.py file so that it can be shared with wrapper / wrapper_async, which power the completion() and acompletion() APIs

dbczumar · 2024-11-26T11:29:40Z

litellm/router_utils/retry_utils.py

@@ -0,0 +1,394 @@
+import asyncio


The contents of this file are moved from router.py: https://github.com/BerriAI/litellm/pull/6916/files#r1858305693. There aren't any other notable changes (preexisting logic is preserved)

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-11-27T00:39:28Z

@krrishdholakia Can you take a look at this PR?

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-11-27T11:09:05Z

litellm/utils.py

+        num_retries = _get_and_reset_retries_for_wrapper_call(kwargs)
+        result = await async_run_with_retries(
+            original_function=original_function,
+            original_function_args=args,
+            original_function_kwargs=kwargs,
+            num_retries=num_retries,
+            retry_after=0,
+            retry_policy=kwargs.get("retry_policy"),
+            fallbacks=kwargs.get("fallbacks", []),
+            context_window_fallbacks=kwargs.get("context_window_fallback_dict", {}).get(
+                model, []
+            ),
+            content_policy_fallbacks=[],
+            get_healthy_deployments=lambda *args, **kwargs: _get_mock_healthy_deployments(
+                model
+            ),
+            log_retry=lambda kwargs, e: kwargs,
+            model_list=[],
+        )


Updated model call to perform retries

dbczumar · 2024-11-27T11:10:11Z

litellm/utils.py

+        result = run_with_retries(
+            original_function=original_function,
+            original_function_args=args,
+            original_function_kwargs=kwargs,
+            num_retries=num_retries,
+            retry_after=0,
+            retry_policy=kwargs.get("retry_policy"),
+            fallbacks=kwargs.get("fallbacks", []),
+            context_window_fallbacks=kwargs.get("context_window_fallback_dict", {}).get(
+                model, []
+            ),
+            content_policy_fallbacks=[],
+            get_healthy_deployments=lambda *args, **kwargs: _get_mock_healthy_deployments(
+                model
+            ),
+            log_retry=lambda kwargs, e: kwargs,
+            model_list=[],
+        )


Updated model call to perform retries

krrishdholakia · 2024-11-27T18:19:15Z

acknowledging this pr - it's on my backlog to review this week @dbczumar

please bump me if not reviewed by friday

okhat · 2024-11-29T00:34:19Z

Thank you @krrishdholakia !

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-12-02T06:16:01Z

@krrishdholakia Bumping this :)

dbczumar · 2024-12-03T21:14:58Z

@krrishdholakia Any updates here?

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-12-05T23:42:11Z

@krrishdholakia Bumping this again. Also tagging @ishaan-jaff. This is fairly urgent for our customers. Thanks for your help!

okhat · 2024-12-14T14:47:15Z

Hey @krrishdholakia & @ishaan-jaff ! Really appreciate your support and excellent work on LiteLLM. We're quite interested in this feature to allow DSPy users to enjoy effective retry policies.

krrishdholakia · 2024-12-14T16:38:25Z

Hey @okhat i implemented this already in a separate PR.

I believe the fix for the multiplying retries is on main as well and should be live in the next release

Cc: @dbczumar

dbczumar added 14 commits November 25, 2024 20:42

fix

49be573

Signed-off-by: dbczumar <[email protected]>

fix

ea530ee

Signed-off-by: dbczumar <[email protected]>

fix

a5b75eb

Signed-off-by: dbczumar <[email protected]>

Progress

508b83b

Signed-off-by: dbczumar <[email protected]>

Merge remote-tracking branch 'origin/main' into retries_extraction

b0dd8f8

fix

ddeb654

Signed-off-by: dbczumar <[email protected]>

fix

f37dc92

Signed-off-by: dbczumar <[email protected]>

fix

203db9e

Signed-off-by: dbczumar <[email protected]>

fix

91369cb

Signed-off-by: dbczumar <[email protected]>

fix

63c041a

Signed-off-by: dbczumar <[email protected]>

cases

2285358

Signed-off-by: dbczumar <[email protected]>

fix

adea250

Signed-off-by: dbczumar <[email protected]>

revert unintended

45b2e2b

Signed-off-by: dbczumar <[email protected]>

fix

c281720

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview November 26, 2024 11:13 View deployment

revert

73fbb9a

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview November 26, 2024 11:14 View deployment

format

0c2580f

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview November 26, 2024 11:17 View deployment

dbczumar commented Nov 26, 2024

View reviewed changes

exc map

58489b2

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Nov 26, 2024

View reviewed changes

vercel bot deployed to Preview November 26, 2024 11:23 View deployment

dbczumar commented Nov 26, 2024

View reviewed changes

Async decorator

d34d91f

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview November 26, 2024 11:44 View deployment

dbczumar added 2 commits November 26, 2024 23:38

Merge remote-tracking branch 'origin/main' into retries_extraction

9bc82cd

fix

6979ef4

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview November 27, 2024 07:40 View deployment

dbczumar added 3 commits November 27, 2024 02:52

fix

3cc745c

Signed-off-by: dbczumar <[email protected]>

fix

070ef86

Signed-off-by: dbczumar <[email protected]>

fix

e69e11f

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview November 27, 2024 11:06 View deployment

fix

3f16f19

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview November 27, 2024 11:08 View deployment

dbczumar commented Nov 27, 2024

View reviewed changes

dbczumar mentioned this pull request Nov 27, 2024

Retries via LiteLLM RetryPolicy stanfordnlp/dspy#1866

Merged

krrishdholakia self-assigned this Nov 27, 2024

dbczumar added 3 commits December 1, 2024 22:07

fix

63e171c

Signed-off-by: dbczumar <[email protected]>

Fix

60d9d94

Signed-off-by: dbczumar <[email protected]>

Merge remote-tracking branch 'origin/main' into retries_extraction

c1c7a8a

vercel bot deployed to Preview December 2, 2024 06:15 View deployment

dbczumar added 3 commits December 5, 2024 01:40

fix

29bd927

Signed-off-by: dbczumar <[email protected]>

Merge main

a03758b

Signed-off-by: dbczumar <[email protected]>

fix

69188df

Signed-off-by: dbczumar <[email protected]>

vercel bot deployed to Preview December 5, 2024 09:51 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support retry policy for completion / acompletion #6916

Support retry policy for completion / acompletion #6916

dbczumar commented Nov 26, 2024 •

edited

Loading

vercel bot commented Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024

dbczumar Nov 26, 2024

dbczumar Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024

dbczumar commented Nov 27, 2024

dbczumar Nov 27, 2024

dbczumar Nov 27, 2024

krrishdholakia commented Nov 27, 2024 •

edited

Loading

okhat commented Nov 29, 2024

dbczumar commented Dec 2, 2024

dbczumar commented Dec 3, 2024

dbczumar commented Dec 5, 2024

okhat commented Dec 14, 2024

krrishdholakia commented Dec 14, 2024

Support retry policy for completion / acompletion #6916

Are you sure you want to change the base?

Support retry policy for completion / acompletion #6916

Conversation

dbczumar commented Nov 26, 2024 • edited Loading

Title

Relevant issues

Type

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

vercel bot commented Nov 26, 2024 • edited Loading

dbczumar Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Nov 26, 2024

Choose a reason for hiding this comment

dbczumar Nov 26, 2024

Choose a reason for hiding this comment

dbczumar Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Nov 26, 2024

Choose a reason for hiding this comment

dbczumar commented Nov 27, 2024

dbczumar Nov 27, 2024

Choose a reason for hiding this comment

dbczumar Nov 27, 2024

Choose a reason for hiding this comment

krrishdholakia commented Nov 27, 2024 • edited Loading

okhat commented Nov 29, 2024

dbczumar commented Dec 2, 2024

dbczumar commented Dec 3, 2024

dbczumar commented Dec 5, 2024

okhat commented Dec 14, 2024

krrishdholakia commented Dec 14, 2024

dbczumar commented Nov 26, 2024 •

edited

Loading

vercel bot commented Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024 •

edited

Loading

dbczumar Nov 26, 2024 •

edited

Loading

krrishdholakia commented Nov 27, 2024 •

edited

Loading