Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug #6425

Conversation

mzusman
Copy link
Contributor

@mzusman mzusman commented Jul 14, 2024

Following #4115 , This PR addresses some issues we've found, and adding tests to Jamba:

  1. BugFix - Mamba cache slot mapping is now properly cleaned up when finished_requests_ids + current_running_requests > max_mamba_cache_capacity.
  2. Added aborted requests to finished_requests_ids inside the scheduler so they will be forwarded to the Jamba's inner state to be properly cleaned up.
  3. Added HasInnerState interface to be able to pass SchedulerConfig to the Jamba modeling file,
    SchedulerConfig is used by the Jamba's inner state to determine the max block capacity for the Mamba cache.
  4. Added few tests for Jamba

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mzusman mzusman requested a review from DarkLight1337 July 15, 2024 09:39
@mzusman
Copy link
Contributor Author

mzusman commented Jul 15, 2024

/ready

@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 15, 2024
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the robustness of the code!

@mzusman
Copy link
Contributor Author

mzusman commented Jul 15, 2024

@DarkLight1337 AFAIU pipeline-parallelism-test occasionally fails regardless of this PR, I think it's ready to be merged.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) July 16, 2024 01:23
@DarkLight1337 DarkLight1337 merged commit 9ad32da into vllm-project:main Jul 16, 2024
74 checks passed
@youkaichao
Copy link
Member

AFAIU pipeline-parallelism-test occasionally fails regardless of this PR

any examples for this? we need to investigate.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jul 16, 2024

AFAIU pipeline-parallelism-test occasionally fails regardless of this PR

any examples for this? we need to investigate.

https://buildkite.com/vllm/ci-aws/builds/4934#0190b6d0-4616-40e5-87cc-b97823315309

I think it's just a connection issue.

@youkaichao
Copy link
Member

[2024-07-15T14:59:29Z] (RayWorkerWrapper pid=6923) WARNING 07-15 14:59:29 custom_all_reduce.py:127] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.
[2024-07-15T15:08:55Z] ERROR

from the log, it seems the server is stuck somewhere for 8 minutes. difficult to debug.

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants