[Bugfix] Fix request cancellation without polling #11190

joerunde · 2024-12-14T01:30:43Z

See discussion and alternate solution in #11096

I came to agree with @jakkdl in encode/starlette#2094 and the linked issues that polling for disconnects with request.is_disconnected() introduces more problems than it's worth. Instead, we can use the pattern that's already in StreamingResponse and have a separate async task wait for a disconnect message, cancelling our work if one is received. The key here is that a StreamingResponse is able to safely consume all new messages because the request body has already been read. Our request handlers have the same guarantee, since fastapi first reads and parses the request and builds a pdydantic object for us before invoking our handler.

This PR implements a decorator for our fastapi handlers that will cancel them if a disconnect message is received while they are running. This is implemented with asynco directly instead of with anyio, because the rest of the code base assumes asyncio.

The advantages here are:

No extra polling overhead and need for configuring timeouts and poll intervals
Work can be canceled as soon as a disconnect happens, instead of at the next disconnect check
This is completely transparent to our code, and we can remove the existing cancelation handling that's coupled inside the server logic

Disadvantages

This doesn't work with entrypoints/api_sever.py because that handler reads the request body itself, so our one cancellation test in tests/async_engine/test_api_server.py fails. I'll need to write a new one, but it's 6pm on a Friday 🙃

I manually verified that this works to cancel both streaming and non-streaming requests, let me know what y'all think of doing this instead

FIX #10087

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2024-12-14T01:30:54Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2024-12-16T18:21:09Z

Turns out the old async llm engine needed to check for cancellation errors for this new method to work, so I shuffled around the base api_server.py to hook up the new decorator so our test with it + the async llm engine works again.

Working now on a test for the openai api server, that doesn't necessarily need to block merging if we're on a tight schedule but should be in soon 🤞

@mgoin @simon-mo Can I get a ready on this to get the full CI suite running?

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2024-12-16T19:13:57Z

Okay, the test for request cancellation with the openai server is in as well. It overloads the server with a ton of requests and then cancels them, ensuring the server can still respond after. I would have rather been able to do something like check the /metrics output for aborted requests, but those aren't currently tracked by our metrics.

mgoin

Even though it does have the potential to be a footgun (as you documented), I like the usage of @with_cancellation so LGTM pending green!

tests/entrypoints/openai/test_basic.py

joerunde · 2024-12-16T21:51:35Z

Ah shoot, the test was too much for the smaller cards we run on in CI and it failed even though the logs do show plenty of requests being aborted :(

I'll dial it back some- I don't want this to be flaky so I think fewer requests with more tokens each would be better at piling up load without burdening the server with handling so many aborts

Signed-off-by: Joe Runde <[email protected]>

Signed-off-by: Sage Moore <[email protected]>

🐛 Cancel requests without polling

e93f0c6

Signed-off-by: Joe Runde <[email protected]>

mergify bot added the frontend label Dec 14, 2024

🔥 remove cancel handler where it doesn't work

ce1effc

Signed-off-by: Joe Runde <[email protected]>

DarkLight1337 requested a review from njhill December 14, 2024 02:20

joerunde added 2 commits December 16, 2024 11:03

🐛 Add cancel hook to async llm engine

9af8748

Signed-off-by: Joe Runde <[email protected]>

🐛 add cancel to encode on async engine

6356ec8

Signed-off-by: Joe Runde <[email protected]>

🧪 test openai server cancellation

4a3831e

Signed-off-by: Joe Runde <[email protected]>

joerunde requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners December 16, 2024 19:09

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 16, 2024

mgoin approved these changes Dec 16, 2024 •

edited

Loading

View reviewed changes

mgoin approved these changes Dec 16, 2024

View reviewed changes

tests/entrypoints/openai/test_basic.py Show resolved Hide resolved

joerunde added 4 commits December 16, 2024 15:09

🐛 dial back the cancellation test

c71878d

Signed-off-by: Joe Runde <[email protected]>

Merge remote-tracking branch 'upstream/main' into cancel-fix

6432af7

🐛 add time back

01db8c4

Signed-off-by: Joe Runde <[email protected]>

🐛 fit test in memory :(

bd312be

Signed-off-by: Joe Runde <[email protected]>

simon-mo merged commit 2d1b9ba into vllm-project:main Dec 17, 2024
53 of 55 checks passed

DarkLight1337 mentioned this pull request Dec 18, 2024

[Bugfix] Fix request disconnect check when BaseHTTPMiddleware is present #11096

Closed

SageMoore pushed a commit to neuralmagic/vllm that referenced this pull request Dec 19, 2024

[Bugfix] Fix request cancellation without polling (vllm-project#11190)

1f59d9d

Signed-off-by: Sage Moore <[email protected]>

DarkLight1337 mentioned this pull request Dec 23, 2024

[Bug]: 0.6.5 randomly closes connection/drops requests #11421

Open

1 task

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Bugfix] Fix request cancellation without polling (vllm-project#11190)

064ba58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix request cancellation without polling #11190

[Bugfix] Fix request cancellation without polling #11190

joerunde commented Dec 14, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 14, 2024

joerunde commented Dec 16, 2024

joerunde commented Dec 16, 2024

mgoin left a comment

joerunde commented Dec 16, 2024

[Bugfix] Fix request cancellation without polling #11190

[Bugfix] Fix request cancellation without polling #11190

Conversation

joerunde commented Dec 14, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 14, 2024

joerunde commented Dec 16, 2024

joerunde commented Dec 16, 2024

mgoin left a comment

Choose a reason for hiding this comment

joerunde commented Dec 16, 2024

joerunde commented Dec 14, 2024 •

edited by github-actions bot

Loading