[Perf] Embeddings: Use router's O(1) lookup and shared sessions #16344

AlexsanderHamir · 2025-11-07T02:12:13Z

Title

[Perf] Embeddings: Use router's O(1) lookup and shared sessions

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

⚡️Performance Improvement

Changes

Use router deployment O(1) lookups.
Allow ProxyBaseLLMRequestProcessing to accept the aembedding route.
Route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, header handling, and shared session logic.

Performance Gains

Type	Name	# Requests	Median (ms)	95%ile (ms)	99%ile (ms)	Average (ms)	Current RPS
POST	/embeddings	229947	290	5700	7200	844.22	1216.7
Custom	LiteLLM Overhead Duration (ms)	229941	14	98	130	31.89	1216.7
	Aggregated	459888	160	1200	6800	438.06	2433.4

Type	Name	# Requests	Median (ms)	95%ile (ms)	99%ile (ms)	Average (ms)	Current RPS
POST	/embeddings	344132	230	430	780	262.52	995.6
Custom	LiteLLM Overhead Duration (ms)	344132	6	28	47	9.46	995.6
	Aggregated	688264	150	360	570	135.99	1991.2

Manual Test

This test was performed with a real API key using the model text-embedding-3-large

- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks - route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses - tighten token array decoding logic by using router deployment lookups and the unified error handler

The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected. This was caused by a combination of three distinct issues: 1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup. 2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers. 3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists. Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.

vercel · 2025-11-07T02:12:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Ready	Preview	Comment	Nov 9, 2025 1:29am

Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.

The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.

ishaan-jaff

ishaan-jaff · 2025-11-08T19:54:29Z

litellm/proxy/common_request_processing.py


        ## LOGGING OBJECT ## - initialize logging object for logging success/failure events for call
        ## IMPORTANT Note: - initialize this before running pre-call checks. Ensures we log rejected requests to langfuse.
+        if route_type == "aembedding":


why is this being added ?

This was added while exploring the code and accidentally left in. Removed it as it’s unnecessary.

ishaan-jaff · 2025-11-08T19:55:39Z

litellm/proxy/proxy_server.py

            and len(data["input"]) > 0
            and isinstance(data["input"][0], list)
-            and isinstance(data["input"][0][0], int)
+            and all(isinstance(token, int) for token in data["input"][0])


why are we changing this ?

This was included by mistake. It belongs to a different branch and is unrelated to this PR.

I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.

This change was not related to the embeddings refactor and actually belonged to a different branch.

AlexsanderHamir added 2 commits November 6, 2025 16:16

AlexsanderHamir requested a review from ishaan-jaff November 7, 2025 02:12

Merge branch 'main' into litellm_embeddings_fix

7ce1aaf

vercel bot deployed to Preview November 7, 2025 16:24 View deployment

AlexsanderHamir marked this pull request as ready for review November 7, 2025 23:03

Merge remote-tracking branch 'origin/main' into litellm_embeddings_fix

758b3da

vercel bot deployed to Preview November 7, 2025 23:12 View deployment

vercel bot deployed to Preview November 8, 2025 00:17 View deployment

Update proxy exception test

4e98013

The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.

vercel bot deployed to Preview November 8, 2025 00:35 View deployment

ishaan-jaff requested changes Nov 8, 2025

View reviewed changes

testing: unsure of this change

ce26c26

I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.

vercel bot deployed to Preview November 8, 2025 20:30 View deployment

fix: remove unrelated change

d0b0adb

This change was not related to the embeddings refactor and actually belonged to a different branch.

vercel bot deployed to Preview November 8, 2025 20:52 View deployment

vercel bot deployed to Preview November 8, 2025 22:12 View deployment

vercel bot deployed to Preview November 8, 2025 22:51 View deployment

vercel bot deployed to Preview November 9, 2025 01:23 View deployment

Merge: Fix recurrent merge conflic

bb9c749

AlexsanderHamir force-pushed the litellm_embeddings_fix branch from ef01eba to bb9c749 Compare November 9, 2025 01:26

vercel bot deployed to Preview November 9, 2025 01:29 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Perf] Embeddings: Use router's O(1) lookup and shared sessions #16344

[Perf] Embeddings: Use router's O(1) lookup and shared sessions #16344

AlexsanderHamir commented Nov 7, 2025 •

edited

Loading

Uh oh!

vercel bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

ishaan-jaff left a comment

Uh oh!

ishaan-jaff Nov 8, 2025

Uh oh!

AlexsanderHamir Nov 8, 2025

Uh oh!

ishaan-jaff Nov 8, 2025

Uh oh!

AlexsanderHamir Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Perf] Embeddings: Use router's O(1) lookup and shared sessions #16344

Are you sure you want to change the base?

[Perf] Embeddings: Use router's O(1) lookup and shared sessions #16344

Conversation

AlexsanderHamir commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Performance Gains

Manual Test

Uh oh!

vercel bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ishaan-jaff left a comment

Choose a reason for hiding this comment

Uh oh!

ishaan-jaff Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

AlexsanderHamir Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

ishaan-jaff Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

AlexsanderHamir Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexsanderHamir commented Nov 7, 2025 •

edited

Loading

vercel bot commented Nov 7, 2025 •

edited

Loading