🐛 fixup request cancellation for v0.6.5 #196
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Compatibility fix for vllm v0.6.5
Description
vLLM has removed the cancellation polling in iterate_with_cancellation and merge_async_iterators, moving it up to the http request handler. This PR removes those integrations, and adds logging integration when requests are cancelled.
It turns out the async grpc server implementation will cancel the task running the handler when the context cancels, so there is no need to poll or implement our own cancellation handling anyway. The LLM engine implementations in vLLM already also catch
asyncio.CancelledError
and abort the work in the engine for us.TL;DR: We didn't need this anyway
How Has This Been Tested?
Manual verification of logs and that work is removed from the engine's queues
Merge criteria: