-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425
Conversation
according to max_num_seqs
👋 Hi! Thank you for contributing to the vLLM project. Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well. To run full CI, you can do one of these:
🚀 |
/ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the robustness of the code!
@DarkLight1337 AFAIU pipeline-parallelism-test occasionally fails regardless of this PR, I think it's ready to be merged. |
any examples for this? we need to investigate. |
https://buildkite.com/vllm/ci-aws/builds/4934#0190b6d0-4616-40e5-87cc-b97823315309 I think it's just a connection issue. |
from the log, it seems the server is stuck somewhere for 8 minutes. difficult to debug. |
…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]>
…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]>
…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]>
…eanup bug (vllm-project#6425) Co-authored-by: Mor Zusman <[email protected]> Signed-off-by: Alvant <[email protected]>
Following #4115 , This PR addresses some issues we've found, and adding tests to Jamba:
finished_requests_ids
+current_running_requests
>max_mamba_cache_capacity
.finished_requests_ids
inside the scheduler so they will be forwarded to the Jamba's inner state to be properly cleaned up.HasInnerState
interface to be able to passSchedulerConfig
to the Jamba modeling file,SchedulerConfig is used by the Jamba's inner state to determine the max block capacity for the Mamba cache.