-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Use request id from header #10968
Conversation
Signed-off-by: Joe Runde <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Joe Runde <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, thanks for enabling more control over this!
I don't know if it's commonsensical but I believe webservers must not be affected by malicious client. If a (malicious or purely curious) client send 1mb of string as X-Request-Header with ~20k estimated output token size streaming chat completion, then handling this request consumes additional 20gb network bandwidth for repeatedly sending the request id to the SSE response, which sounds pretty huge workload for a poor API gateway router. Or a company using vLLM as its main LLM serving engine may keep all LLM response for a long-term (to oversee their poor workers, I'm not telling my own experience 🤣 ), then its poor DB may suffer from the arbitrarily long VARCHAR column Of course there is effectively no such black client out there, but when it comes who knows... |
Ah, that's a really good point! Though I think that problem exists today since we set
Speaking from some experience, in this case users don't get to connect to vLLM directly, and another gateway is responsible for validating requests, rate limiting, setting headers etc. |
I think the request_id must be guaranteed to be unique since it is used as a key in many places. |
def _base_request_id(raw_request: Request, | ||
default: Optional[str] = None) -> Optional[str]: | ||
"""Pulls the request id to use from a header, if provided""" | ||
default = default or random_uuid() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, the request ID is still created from UUID.
@WangErXiao In the context of running the OpenAI server, both the v0 and v1 engines now guard against duplicate request ids, so users will get a 400 if they send in concurrent requests with the same id. See: #11036 |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
This is a small proposal to populate the request ids in the engine from the
X-Request-Id
header when running the openai server. This will benefit traceability: when users pass in the header they can correlate vLLM logs with the request that generated them, something that our users are asking for. If the header is not passed, behavior remains the same.Something to consider is the implications of users potentially passing in the same request id multiple times- nothing seems to crash when that happens, but in that case it would be hard to distinguish logs from different requests.