[Bugfix] Fix request disconnect check when BaseHTTPMiddleware is present #11096
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #10087, by using a different method to check if a request has disconnected (instead of starlette's
Request.is_disconnected()
, which doesn't work as expected in case a BaseHTTPMiddleware is added to the server - see here - encode/starlette#2094).The PR uses the method suggested in encode/starlette#2094 (comment)
Added a test that fails on
main
, and is fixed with this PRBenchmark
Model: Qwen/Qwen2.5-1.5B-Instruct
Hardware: single H100
Serve run command:
vllm serve Qwen/Qwen2.5-1.5B-Instruct
Endpoint:
v1/completions
(benchmark run command:
python3 benchmark_serving.py --model Qwen/Qwen2.5-1.5B-Instruct --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
)Conclusion: The fix doesn't hurt performance at all. I guess this is because
request.recieve()
used here is generally non-blocking