[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829

Abatom · 2025-05-28T09:29:32Z

When releasing requests in the waiting queue that have not yet been allocated KV cache, the assertion assert request_id in self.single_type_manager.req_to_blocks fails.

When no blocks are assigned, it returns [[]] to maintain consistent type semantics (list[list[int]]).

Steps to reproduce:

Overload the system with concurrent requests beyond vLLM's throughput limit
Allow requests to accumulate in the waiting queue
Force-terminate all pending requests → Triggers assert request_id in self.single_type_manager.req_to_blocks

Signed-off-by: Abatom <[email protected]>

github-actions · 2025-05-28T09:29:43Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

heheda12345

Thanks for catching up this bug. Can you add a test for this corner case?

vllm/v1/core/kv_cache_manager.py

Abatom · 2025-05-28T16:00:54Z

Thanks for catching up this bug. Can you add a test for this corner case?

Add unit tests or other types of tests?

heheda12345 · 2025-05-28T16:02:08Z

Can you add a unit test?

Abatom · 2025-05-28T16:10:18Z

Can you add a unit test?

Let me think about it. It doesn't seem that easy to write, because it requires coordination between the scheduler and the kvcache manager to complete.

Co-authored-by: Chen Zhang <[email protected]> Signed-off-by: Abatom <[email protected]> Signed-off-by: Chen Zhang <[email protected]>

Abatom · 2025-05-29T10:22:07Z

@heheda12345 PR #18485 addresses the same issue, but I think my approach is better because it resolves the problem at its root.

heheda12345 · 2025-05-30T05:21:14Z

Let me think about it. It doesn't seem that easy to write, because it requires coordination between the scheduler and the kvcache manager to complete.

Can you at least give a script to reproduce the bug you mentioned in your PR description?

NickLucche

Hey, thanks for the contribution!

While the proposed change is quite clean for the specific case you mentioned, I am a bit worried about breaking the invariant that get_block_ids is enforcing.
Ie when I call get_block_ids in .schedule, I want the program to crash if there's no request in the manager.
Returning a new empty block for non-existing requests might make things way harder to debug, as things will likely break in a weird state right after that.

Abatom · 2025-05-30T09:30:45Z

@NickLucche I can understand your concern. But this might affect the normal logic, as I mentioned in the PR description. The queued requests originally do not have kvcache. When a queued request is interrupted, it will go to the get_block_ids function and then crash. However, in the schedule, there is no way to determine whether this request has kvcache. So what should we do?

NickLucche · 2025-05-30T12:26:34Z

One way to address the concern is to separate the get_block_ids behavior depending where it is called: when inside the scheduling loop, current behavior should be maintained. When aborting the request, we can accept a missing request.
I think the easiest is to have an "if" checking the request state similarly to what I implemented in my PR. Otherwise you just call a separate get_block_ids function that doesn't enforce the invariant.

The core is that it's safest to maintain current behavior for all cases where get_block_ids is used but the one where a request is being evicted.

Abatom · 2025-05-30T15:02:51Z

@NickLucche Okay, I added some comments on PR #18485.

heheda12345 · 2025-05-30T15:05:47Z

@Abatom Can you show a reproduce script and the call stack of the crash? I'm quite confusing about why this triggers a crash. If we don't consider kv connectors, self.kv_cache_manager.get_block_ids will only be called when the request is inside waiting queue or running queue. But as here, the aborted requests are removed from both of the two queues.

vllm/vllm/v1/core/sched/scheduler.py

Lines 870 to 873 in 84ec470

    
           if request.status == RequestStatus.RUNNING: 
        
               self.running.remove(request) 
        
           else: 
        
               self.waiting.remove(request)

Abatom · 2025-05-31T17:32:41Z

@heheda12345 I have already provided revision suggestions on PR #18485. According to those modifications, my issue can be resolved, so I'll close this PR for now.

bugfix for waiting request

106cdfb

Signed-off-by: Abatom <[email protected]>

Abatom requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners May 28, 2025 09:29

mergify bot added the v1 label May 28, 2025

heheda12345 reviewed May 28, 2025

View reviewed changes

vllm/v1/core/kv_cache_manager.py Outdated Show resolved Hide resolved

Update vllm/v1/core/kv_cache_manager.py

c72c447

Co-authored-by: Chen Zhang <[email protected]> Signed-off-by: Abatom <[email protected]> Signed-off-by: Chen Zhang <[email protected]>

Abatom force-pushed the get_block_ids branch from 5a6a7be to c72c447 Compare May 29, 2025 02:11

heheda12345 mentioned this pull request May 29, 2025

[P/D][Core] Fix abrupt request abort #18485

Open

NickLucche suggested changes May 30, 2025

View reviewed changes

Merge branch 'main' into get_block_ids

314f9c2

Abatom closed this May 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829

[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829

Uh oh!

Abatom commented May 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 28, 2025

Uh oh!

heheda12345 left a comment

Uh oh!

Uh oh!

Abatom commented May 28, 2025

Uh oh!

heheda12345 commented May 28, 2025

Uh oh!

Abatom commented May 28, 2025

Uh oh!

Abatom commented May 29, 2025

Uh oh!

heheda12345 commented May 30, 2025

Uh oh!

NickLucche left a comment

Uh oh!

Abatom commented May 30, 2025

Uh oh!

NickLucche commented May 30, 2025

Uh oh!

Abatom commented May 30, 2025

Uh oh!

heheda12345 commented May 30, 2025

Uh oh!

Abatom commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829

[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829

Uh oh!

Conversation

Abatom commented May 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 28, 2025

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Abatom commented May 28, 2025

Uh oh!

heheda12345 commented May 28, 2025

Uh oh!

Abatom commented May 28, 2025

Uh oh!

Abatom commented May 29, 2025

Uh oh!

heheda12345 commented May 30, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Abatom commented May 30, 2025

Uh oh!

NickLucche commented May 30, 2025

Uh oh!

Abatom commented May 30, 2025

Uh oh!

heheda12345 commented May 30, 2025

Uh oh!

Abatom commented May 31, 2025

Uh oh!

Uh oh!

Abatom commented May 28, 2025 •

edited by github-actions bot

Loading