Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Use request id from header #10968

Merged
merged 4 commits into from
Dec 10, 2024

Conversation

joerunde
Copy link
Collaborator

@joerunde joerunde commented Dec 6, 2024

This is a small proposal to populate the request ids in the engine from the X-Request-Id header when running the openai server. This will benefit traceability: when users pass in the header they can correlate vLLM logs with the request that generated them, something that our users are asking for. If the header is not passed, behavior remains the same.

Something to consider is the implications of users potentially passing in the same request id multiple times- nothing seems to crash when that happens, but in that case it would be hard to distinguish logs from different requests.

Copy link

github-actions bot commented Dec 6, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 6, 2024
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, thanks for enabling more control over this!

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 7, 2024
@cjackal
Copy link
Contributor

cjackal commented Dec 9, 2024

I don't know if it's commonsensical but I believe webservers must not be affected by malicious client.

If a (malicious or purely curious) client send 1mb of string as X-Request-Header with ~20k estimated output token size streaming chat completion, then handling this request consumes additional 20gb network bandwidth for repeatedly sending the request id to the SSE response, which sounds pretty huge workload for a poor API gateway router. Or a company using vLLM as its main LLM serving engine may keep all LLM response for a long-term (to oversee their poor workers, I'm not telling my own experience 🤣 ), then its poor DB may suffer from the arbitrarily long VARCHAR column request_id.

Of course there is effectively no such black client out there, but when it comes who knows...

@joerunde
Copy link
Collaborator Author

joerunde commented Dec 9, 2024

I don't know if it's commonsensical but I believe webservers must not be affected by malicious client.

If a (malicious or purely curious) client send 1mb of string as X-Request-Header with ~20k estimated output token size streaming chat completion, then handling this request consumes additional 20gb network bandwidth for repeatedly sending the request id to the SSE response, which sounds pretty huge workload for a poor API gateway router. Or a company using vLLM as its main LLM serving engine may keep all LLM response for a long-term (to oversee their poor workers, I'm not telling my own experience 🤣 ), then its poor DB may suffer from the arbitrarily long VARCHAR column request_id.

Of course there is effectively no such black client out there, but when it comes who knows...

Ah, that's a really good point! Though I think that problem exists today since we set X-Request-Id on the response already- this change only propagates that ID onto other logs in the engine.

Or a company using vLLM as its main LLM serving engine...

Speaking from some experience, in this case users don't get to connect to vLLM directly, and another gateway is responsible for validating requests, rate limiting, setting headers etc.

@DarkLight1337 DarkLight1337 merged commit 980ad39 into vllm-project:main Dec 10, 2024
50 checks passed
@WangErXiao
Copy link
Contributor

I think the request_id must be guaranteed to be unique since it is used as a key in many places.

def _base_request_id(raw_request: Request,
default: Optional[str] = None) -> Optional[str]:
"""Pulls the request id to use from a header, if provided"""
default = default or random_uuid()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the request ID is still created from UUID.

@joerunde
Copy link
Collaborator Author

@WangErXiao In the context of running the OpenAI server, both the v0 and v1 engines now guard against duplicate request ids, so users will get a 400 if they send in concurrent requests with the same id. See: #11036

@joerunde joerunde deleted the request-id branch December 11, 2024 21:19
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants