[Bugfix] Fix request disconnect check when BaseHTTPMiddleware is present #11096

tomeras91 · 2024-12-11T11:23:05Z

This PR fixes #10087, by using a different method to check if a request has disconnected (instead of starlette's Request.is_disconnected(), which doesn't work as expected in case a BaseHTTPMiddleware is added to the server - see here - encode/starlette#2094).

The PR uses the method suggested in encode/starlette#2094 (comment)

Added a test that fails on main, and is fixed with this PR

Benchmark

Model: Qwen/Qwen2.5-1.5B-Instruct
Hardware: single H100
Serve run command: vllm serve Qwen/Qwen2.5-1.5B-Instruct

Endpoint: v1/completions
(benchmark run command: python3 benchmark_serving.py --model Qwen/Qwen2.5-1.5B-Instruct --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json)

branch	throughput (requests/s)	mean TTFT (msec)	mean TPOT (msec)	Total generated tokens
main (commit `82c73fd`)	52.12	6117.88	24.92	189868
pr (commit `0ba0ff8`)	52.52	6064.14	24.20	190241

Conclusion: The fix doesn't hurt performance at all. I guess this is because request.recieve() used here is generally non-blocking

…resence of BaseHTTPMilddleware Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

github-actions · 2024-12-11T11:23:17Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

tomeras91 · 2024-12-11T11:25:51Z

CC @njhill

vllm/utils.py

Signed-off-by: Tomer Asida <[email protected]>

DarkLight1337 · 2024-12-18T02:14:18Z

As per offline discussion, we have decided to fix it via #11190 instead.

tomeras91 added 2 commits December 11, 2024 12:15

introduce a patch to request.is_disconnected that works also in the p…

f8c1365

…resence of BaseHTTPMilddleware Signed-off-by: Tomer Asida <[email protected]>

Add test case that fails on main and is solved now

0ba0ff8

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 requested review from njhill, robertgshaw2-redhat and simon-mo as code owners December 11, 2024 11:23

mergify bot added the frontend label Dec 11, 2024

DarkLight1337 reviewed Dec 11, 2024

View reviewed changes

vllm/utils.py Outdated Show resolved Hide resolved

move is_disconnected_patch to a new file - vllm/entrypoints/utils.py

728acc4

Signed-off-by: Tomer Asida <[email protected]>

joerunde mentioned this pull request Dec 14, 2024

[Bugfix] Fix request cancellation without polling #11190

Merged

DarkLight1337 closed this Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix request disconnect check when BaseHTTPMiddleware is present #11096

[Bugfix] Fix request disconnect check when BaseHTTPMiddleware is present #11096

tomeras91 commented Dec 11, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 11, 2024

tomeras91 commented Dec 11, 2024

DarkLight1337 commented Dec 18, 2024

[Bugfix] Fix request disconnect check when BaseHTTPMiddleware is present #11096

[Bugfix] Fix request disconnect check when BaseHTTPMiddleware is present #11096

Conversation

tomeras91 commented Dec 11, 2024 • edited by github-actions bot Loading

Benchmark

github-actions bot commented Dec 11, 2024

tomeras91 commented Dec 11, 2024

DarkLight1337 commented Dec 18, 2024

tomeras91 commented Dec 11, 2024 •

edited by github-actions bot

Loading