-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
[Perf] Embeddings: Use router's O(1) lookup and shared sessions #16344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks - route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses - tighten token array decoding logic by using router deployment lookups and the unified error handler
The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected. This was caused by a combination of three distinct issues: 1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup. 2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers. 3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists. Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.
The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.
ishaan-jaff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| ## LOGGING OBJECT ## - initialize logging object for logging success/failure events for call | ||
| ## IMPORTANT Note: - initialize this before running pre-call checks. Ensures we log rejected requests to langfuse. | ||
| if route_type == "aembedding": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this being added ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added while exploring the code and accidentally left in. Removed it as it’s unnecessary.
litellm/proxy/proxy_server.py
Outdated
| and len(data["input"]) > 0 | ||
| and isinstance(data["input"][0], list) | ||
| and isinstance(data["input"][0][0], int) | ||
| and all(isinstance(token, int) for token in data["input"][0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we changing this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was included by mistake. It belongs to a different branch and is unrelated to this PR.
I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.
This change was not related to the embeddings refactor and actually belonged to a different branch.
ef01eba to
bb9c749
Compare
Title
[Perf] Embeddings: Use router's O(1) lookup and shared sessions
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitType
⚡️Performance Improvement
Changes
Performance Gains
Before | 4 CPUs | 8 GB RAM | 4 Instances | DB | No Redis
After | 4 CPUs | 8 GB RAM | 4 Instances | DB | No Redis
Manual Test
This test was performed with a real API key using the model
text-embedding-3-large