[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425

mzusman · 2024-07-14T14:40:54Z

Following #4115 , This PR addresses some issues we've found, and adding tests to Jamba:

BugFix - Mamba cache slot mapping is now properly cleaned up when finished_requests_ids + current_running_requests > max_mamba_cache_capacity.
Added aborted requests to finished_requests_ids inside the scheduler so they will be forwarded to the Jamba's inner state to be properly cleaned up.
Added HasInnerState interface to be able to pass SchedulerConfig to the Jamba modeling file,
SchedulerConfig is used by the Jamba's inner state to determine the max block capacity for the Mamba cache.
Added few tests for Jamba

according to max_num_seqs

github-actions · 2024-07-14T14:41:07Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

tests/models/test_jamba.py

vllm/model_executor/models/interfaces.py

the model instance

mzusman · 2024-07-15T11:16:29Z

/ready

DarkLight1337

Thanks for improving the robustness of the code!

mzusman · 2024-07-15T16:49:05Z

@DarkLight1337 AFAIU pipeline-parallelism-test occasionally fails regardless of this PR, I think it's ready to be merged.

youkaichao · 2024-07-16T04:44:28Z

AFAIU pipeline-parallelism-test occasionally fails regardless of this PR

any examples for this? we need to investigate.

DarkLight1337 · 2024-07-16T04:56:06Z

AFAIU pipeline-parallelism-test occasionally fails regardless of this PR

any examples for this? we need to investigate.

https://buildkite.com/vllm/ci-aws/builds/4934#0190b6d0-4616-40e5-87cc-b97823315309

I think it's just a connection issue.

youkaichao · 2024-07-16T05:03:44Z

[2024-07-15T14:59:29Z] (RayWorkerWrapper pid=6923) WARNING 07-15 14:59:29 custom_all_reduce.py:127] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.
[2024-07-15T15:08:55Z] ERROR

from the log, it seems the server is stuck somewhere for 8 minutes. difficult to debug.

…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]>

…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]> Signed-off-by: Alvant <[email protected]>

Mor Zusman added 11 commits July 14, 2024 16:31

Add Jamba tests, preemption, the turning door bug test and cleanup

4baae80

Add aborted requests ids to the finished_requests_ids

f2ebf51

Add interface for HasInnerState models

f6603d2

Fix the "turning door" bug

607dc75

Add scheduler config to has inner state models

4e00382

Add has inner config to jamba and create mamba cache max capacity

5279333

according to max_num_seqs

Add import

17efa06

Remove redundant test

b23af53

Add test

481d62b

Add imports

097eb8b

Format

57559b3

DarkLight1337 reviewed Jul 14, 2024

View reviewed changes

tests/models/test_jamba.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 14, 2024

View reviewed changes

vllm/model_executor/models/interfaces.py Show resolved Hide resolved

Mor Zusman added 7 commits July 15, 2024 12:11

Switch jamba test (models/batching) to regular tokens comparison

8c6e780

Add _HasInnerStateType to support isinstance with a type and not just

4e029de

the model instance

Format

ed1b84e

Revert test to reduce diff

c66aa3d

Put back the batching test

ba9691a

Remove comment

7572450

Format

d25c590

mzusman requested a review from DarkLight1337 July 15, 2024 09:39

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 15, 2024

DarkLight1337 approved these changes Jul 15, 2024

View reviewed changes

Mor Zusman added 2 commits July 15, 2024 15:49

Reduce the max_tokens to 20

ca28c52

set batching test to 20

acfa4c7

DarkLight1337 enabled auto-merge (squash) July 16, 2024 01:23

DarkLight1337 merged commit 9ad32da into vllm-project:main Jul 16, 2024
74 checks passed

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cl…

2efff72

…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]>

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cl…

9d2dde3

…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cl…

9feb0a4

…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]>

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cl…

20b5a07

…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]> Signed-off-by: Alvant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425

mzusman commented Jul 14, 2024

github-actions bot commented Jul 14, 2024

mzusman commented Jul 15, 2024

DarkLight1337 left a comment

mzusman commented Jul 15, 2024

youkaichao commented Jul 16, 2024

DarkLight1337 commented Jul 16, 2024 •

edited

Loading

youkaichao commented Jul 16, 2024

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425

Conversation

mzusman commented Jul 14, 2024

github-actions bot commented Jul 14, 2024

mzusman commented Jul 15, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

mzusman commented Jul 15, 2024

youkaichao commented Jul 16, 2024

DarkLight1337 commented Jul 16, 2024 • edited Loading

youkaichao commented Jul 16, 2024

DarkLight1337 commented Jul 16, 2024 •

edited

Loading