-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Abatom <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching up this bug. Can you add a test for this corner case?
Add unit tests or other types of tests? |
Can you add a unit test? |
Let me think about it. It doesn't seem that easy to write, because it requires coordination between the scheduler and the kvcache manager to complete. |
Co-authored-by: Chen Zhang <[email protected]> Signed-off-by: Abatom <[email protected]> Signed-off-by: Chen Zhang <[email protected]>
@heheda12345 PR #18485 addresses the same issue, but I think my approach is better because it resolves the problem at its root. |
Can you at least give a script to reproduce the bug you mentioned in your PR description? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the contribution!
While the proposed change is quite clean for the specific case you mentioned, I am a bit worried about breaking the invariant that get_block_ids is enforcing.
Ie when I call get_block_ids in .schedule, I want the program to crash if there's no request in the manager.
Returning a new empty block for non-existing requests might make things way harder to debug, as things will likely break in a weird state right after that.
@NickLucche I can understand your concern. But this might affect the normal logic, as I mentioned in the PR description. The queued requests originally do not have kvcache. When a queued request is interrupted, it will go to the |
One way to address the concern is to separate the get_block_ids behavior depending where it is called: when inside the scheduling loop, current behavior should be maintained. When aborting the request, we can accept a missing request. The core is that it's safest to maintain current behavior for all cases where get_block_ids is used but the one where a request is being evicted. |
@NickLucche Okay, I added some comments on PR #18485. |
@Abatom Can you show a reproduce script and the call stack of the crash? I'm quite confusing about why this triggers a crash. If we don't consider kv connectors, vllm/vllm/v1/core/sched/scheduler.py Lines 870 to 873 in 84ec470
|
@heheda12345 I have already provided revision suggestions on PR #18485. According to those modifications, my issue can be resolved, so I'll close this PR for now. |
When releasing requests in the waiting queue that have not yet been allocated KV cache, the assertion assert request_id in self.single_type_manager.req_to_blocks fails.
When no blocks are assigned, it returns [[]] to maintain consistent type semantics (list[list[int]]).
Steps to reproduce: