[Frontend] Use request id from header #10968

joerunde · 2024-12-06T23:18:20Z

This is a small proposal to populate the request ids in the engine from the X-Request-Id header when running the openai server. This will benefit traceability: when users pass in the header they can correlate vLLM logs with the request that generated them, something that our users are asking for. If the header is not passed, behavior remains the same.

Something to consider is the implications of users potentially passing in the same request id multiple times- nothing seems to crash when that happens, but in that case it would be hard to distinguish logs from different requests.

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2024-12-06T23:18:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Joe Runde <[email protected]>

DarkLight1337

Seems reasonable to me, thanks for enabling more control over this!

cjackal · 2024-12-09T14:53:29Z

I don't know if it's commonsensical but I believe webservers must not be affected by malicious client.

If a (malicious or purely curious) client send 1mb of string as X-Request-Header with ~20k estimated output token size streaming chat completion, then handling this request consumes additional 20gb network bandwidth for repeatedly sending the request id to the SSE response, which sounds pretty huge workload for a poor API gateway router. Or a company using vLLM as its main LLM serving engine may keep all LLM response for a long-term (to oversee their poor workers, I'm not telling my own experience 🤣 ), then its poor DB may suffer from the arbitrarily long VARCHAR column request_id.

Of course there is effectively no such black client out there, but when it comes who knows...

joerunde · 2024-12-09T15:22:42Z

I don't know if it's commonsensical but I believe webservers must not be affected by malicious client.

If a (malicious or purely curious) client send 1mb of string as X-Request-Header with ~20k estimated output token size streaming chat completion, then handling this request consumes additional 20gb network bandwidth for repeatedly sending the request id to the SSE response, which sounds pretty huge workload for a poor API gateway router. Or a company using vLLM as its main LLM serving engine may keep all LLM response for a long-term (to oversee their poor workers, I'm not telling my own experience 🤣 ), then its poor DB may suffer from the arbitrarily long VARCHAR column request_id.

Of course there is effectively no such black client out there, but when it comes who knows...

Ah, that's a really good point! Though I think that problem exists today since we set X-Request-Id on the response already- this change only propagates that ID onto other logs in the engine.

Or a company using vLLM as its main LLM serving engine...

Speaking from some experience, in this case users don't get to connect to vLLM directly, and another gateway is responsible for validating requests, rate limiting, setting headers etc.

Signed-off-by: Joe Runde <[email protected]>

WangErXiao · 2024-12-11T03:50:16Z

I think the request_id must be guaranteed to be unique since it is used as a key in many places.

DarkLight1337 · 2024-12-11T03:53:57Z

vllm/entrypoints/openai/serving_engine.py

+    def _base_request_id(raw_request: Request,
+                         default: Optional[str] = None) -> Optional[str]:
+        """Pulls the request id to use from a header, if provided"""
+        default = default or random_uuid()


By default, the request ID is still created from UUID.

joerunde · 2024-12-11T17:41:35Z

@WangErXiao In the context of running the OpenAI server, both the v0 and v1 engines now guard against duplicate request ids, so users will get a 400 if they send in concurrent requests with the same id. See: #11036

Signed-off-by: Joe Runde <[email protected]>

✨ Use request id from header

4a8c0c3

Signed-off-by: Joe Runde <[email protected]>

mergify bot added the frontend label Dec 6, 2024

joerunde mentioned this pull request Dec 6, 2024

🔊 overhaul logging on engine opendatahub-io/vllm-tgis-adapter#191

Merged

3 tasks

📝 add fastapi to docs reqs

3da0f39

Signed-off-by: Joe Runde <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Dec 6, 2024

DarkLight1337 approved these changes Dec 7, 2024

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 7, 2024

joerunde added 2 commits December 9, 2024 08:37

Merge remote-tracking branch 'upstream/main' into request-id

61c83b3

🐛 return default if no raw request

b7d80e8

Signed-off-by: Joe Runde <[email protected]>

DarkLight1337 merged commit 980ad39 into vllm-project:main Dec 10, 2024
50 checks passed

DarkLight1337 mentioned this pull request Dec 10, 2024

[Feature]: logging request_id instead of random uuid #11050

Closed

1 task

DarkLight1337 reviewed Dec 11, 2024

View reviewed changes

joerunde deleted the request-id branch December 11, 2024 21:19

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Frontend] Use request id from header (vllm-project#10968)

51ceb45

Signed-off-by: Joe Runde <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Frontend] Use request id from header (vllm-project#10968)

3bb9fac

Signed-off-by: Joe Runde <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Use request id from header #10968

[Frontend] Use request id from header #10968

joerunde commented Dec 6, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 6, 2024

DarkLight1337 left a comment

cjackal commented Dec 9, 2024

joerunde commented Dec 9, 2024

WangErXiao commented Dec 11, 2024

DarkLight1337 Dec 11, 2024

joerunde commented Dec 11, 2024

[Frontend] Use request id from header #10968

[Frontend] Use request id from header #10968

Conversation

joerunde commented Dec 6, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 6, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

cjackal commented Dec 9, 2024

joerunde commented Dec 9, 2024

WangErXiao commented Dec 11, 2024

DarkLight1337 Dec 11, 2024

Choose a reason for hiding this comment

joerunde commented Dec 11, 2024

joerunde commented Dec 6, 2024 •

edited by github-actions bot

Loading