-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1][VLM] Enable proper chunked prefill for multimodal models #9950
Conversation
Signed-off-by: Roger Wang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
The description has been updated to reflect the repurpose of this PR. |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Closing as this PR is broken down into several small PRs |
This PR will be merged after #9871 which supports chunked prefill of LMMs natively in V1.The goals of this PR are:- Refactor model interface to be compatible with both V0 and V1 engine implementation (calling encoder at model executable vs calling encoder at model runner)- Add placeholder ranges to input processor of all LMMs so that placeholder locations can be precisely tracked for chunked prefill in V1 (and potentially prefix caching)See comment