Skip to content

Conversation

@AlexsanderHamir
Copy link
Collaborator

@AlexsanderHamir AlexsanderHamir commented Nov 7, 2025

Title

[Perf] Embeddings: Use router's O(1) lookup and shared sessions

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

⚡️Performance Improvement

Changes

  • Use router deployment O(1) lookups.
  • Allow ProxyBaseLLMRequestProcessing to accept the aembedding route.
  • Route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, header handling, and shared session logic.

Performance Gains

Before | 4 CPUs | 8 GB RAM | 4 Instances | DB | No Redis

Type Name # Requests Median (ms) 95%ile (ms) 99%ile (ms) Average (ms) Current RPS
POST /embeddings 229947 290 5700 7200 844.22 1216.7
Custom LiteLLM Overhead Duration (ms) 229941 14 98 130 31.89 1216.7
  Aggregated 459888 160 1200 6800 438.06 2433.4

After | 4 CPUs | 8 GB RAM | 4 Instances | DB | No Redis

Type Name # Requests Median (ms) 95%ile (ms) 99%ile (ms) Average (ms) Current RPS
POST /embeddings 344132 230 430 780 262.52 995.6
Custom LiteLLM Overhead Duration (ms) 344132 6 28 47 9.46 995.6
  Aggregated 688264 150 360 570 135.99 1991.2

Manual Test

This test was performed with a real API key using the model text-embedding-3-large

Screenshot 2025-11-07 at 4 52 14 PM Screenshot 2025-11-07 at 4 54 16 PM Screenshot 2025-11-07 at 4 54 38 PM

- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks

- route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses

- tighten token array decoding logic by using router deployment lookups and the unified error handler
The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected.

This was caused by a combination of three distinct issues:

1.  In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup.

2.  In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers.

3.  In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists.

Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.
@vercel
Copy link

vercel bot commented Nov 7, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
litellm Ready Ready Preview Comment Nov 9, 2025 1:29am

Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.
The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.
Copy link
Contributor

@ishaan-jaff ishaan-jaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## LOGGING OBJECT ## - initialize logging object for logging success/failure events for call
## IMPORTANT Note: - initialize this before running pre-call checks. Ensures we log rejected requests to langfuse.
if route_type == "aembedding":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this being added ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added while exploring the code and accidentally left in. Removed it as it’s unnecessary.

and len(data["input"]) > 0
and isinstance(data["input"][0], list)
and isinstance(data["input"][0][0], int)
and all(isinstance(token, int) for token in data["input"][0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we changing this ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was included by mistake. It belongs to a different branch and is unrelated to this PR.

I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.
This change was not related to the embeddings refactor and actually belonged to a different branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants