Skip to content

[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

Abatom
Copy link
Contributor

@Abatom Abatom commented May 28, 2025

When releasing requests in the waiting queue that have not yet been allocated KV cache, the assertion assert request_id in self.single_type_manager.req_to_blocks fails.

When no blocks are assigned, it returns [[]] to maintain consistent type semantics (list[list[int]]).

Steps to reproduce:

  • Overload the system with concurrent requests beyond vLLM's throughput limit
  • Allow requests to accumulate in the waiting queue
  • Force-terminate all pending requests → Triggers assert request_id in self.single_type_manager.req_to_blocks

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label May 28, 2025
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching up this bug. Can you add a test for this corner case?

@Abatom
Copy link
Contributor Author

Abatom commented May 28, 2025

Thanks for catching up this bug. Can you add a test for this corner case?

Add unit tests or other types of tests?

@heheda12345
Copy link
Collaborator

Can you add a unit test?

@Abatom
Copy link
Contributor Author

Abatom commented May 28, 2025

Can you add a unit test?

Let me think about it. It doesn't seem that easy to write, because it requires coordination between the scheduler and the kvcache manager to complete.

Co-authored-by: Chen Zhang <[email protected]>
Signed-off-by: Abatom <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
@Abatom
Copy link
Contributor Author

Abatom commented May 29, 2025

@heheda12345 PR #18485 addresses the same issue, but I think my approach is better because it resolves the problem at its root.

@heheda12345
Copy link
Collaborator

Let me think about it. It doesn't seem that easy to write, because it requires coordination between the scheduler and the kvcache manager to complete.

Can you at least give a script to reproduce the bug you mentioned in your PR description?

Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the contribution!

While the proposed change is quite clean for the specific case you mentioned, I am a bit worried about breaking the invariant that get_block_ids is enforcing.
Ie when I call get_block_ids in .schedule, I want the program to crash if there's no request in the manager.
Returning a new empty block for non-existing requests might make things way harder to debug, as things will likely break in a weird state right after that.

@Abatom
Copy link
Contributor Author

Abatom commented May 30, 2025

@NickLucche I can understand your concern. But this might affect the normal logic, as I mentioned in the PR description. The queued requests originally do not have kvcache. When a queued request is interrupted, it will go to the get_block_ids function and then crash. However, in the schedule, there is no way to determine whether this request has kvcache. So what should we do?

@NickLucche
Copy link
Contributor

One way to address the concern is to separate the get_block_ids behavior depending where it is called: when inside the scheduling loop, current behavior should be maintained. When aborting the request, we can accept a missing request.
I think the easiest is to have an "if" checking the request state similarly to what I implemented in my PR. Otherwise you just call a separate get_block_ids function that doesn't enforce the invariant.

The core is that it's safest to maintain current behavior for all cases where get_block_ids is used but the one where a request is being evicted.

@Abatom
Copy link
Contributor Author

Abatom commented May 30, 2025

@NickLucche Okay, I added some comments on PR #18485.

@heheda12345
Copy link
Collaborator

@Abatom Can you show a reproduce script and the call stack of the crash? I'm quite confusing about why this triggers a crash. If we don't consider kv connectors, self.kv_cache_manager.get_block_ids will only be called when the request is inside waiting queue or running queue. But as here, the aborted requests are removed from both of the two queues.

if request.status == RequestStatus.RUNNING:
self.running.remove(request)
else:
self.waiting.remove(request)

@Abatom
Copy link
Contributor Author

Abatom commented May 31, 2025

@heheda12345 I have already provided revision suggestions on PR #18485. According to those modifications, my issue can be resolved, so I'll close this PR for now.

@Abatom Abatom closed this May 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants