[V1][VLM] Enable proper chunked prefill for multimodal models #9950

ywang96 · 2024-11-02T09:07:18Z

~~This PR will be merged after #9871 which supports chunked prefill of LMMs natively in V1.~~

~~The goals of this PR are:~~
~~- Refactor model interface to be compatible with both V0 and V1 engine implementation (calling encoder at model executable vs calling encoder at model runner)~~
~~- Add placeholder ranges to input processor of all LMMs so that placeholder locations can be precisely tracked for chunked prefill in V1 (and potentially prefix caching)~~

See comment

Signed-off-by: Roger Wang <[email protected]>

github-actions · 2024-11-02T09:07:29Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2024-11-07T20:35:32Z

The description has been updated to reflect the repurpose of this PR.

mergify · 2024-11-09T03:32:17Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Roger Wang <[email protected]>

mergify · 2024-11-13T12:40:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2024-11-27T08:08:41Z

Closing as this PR is broken down into several small PRs

update

4179432

Signed-off-by: Roger Wang <[email protected]>

format

343b467

Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 mentioned this pull request Nov 2, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

85 tasks

DarkLight1337 self-assigned this Nov 2, 2024

ywang96 added 4 commits November 3, 2024 22:22

Merge branch 'main' into chunked-prefill-vlm

7a212f9

fix assignment order

0c472f6

Signed-off-by: Roger Wang <[email protected]>

update

3998f9d

Signed-off-by: Roger Wang <[email protected]>

refactor

0ea3209

Signed-off-by: Roger Wang <[email protected]>

ywang96 changed the title ~~[1/N][Core][VLM] Enable proper chunked prefill for multimodal models~~ [V1][VLM] Enable proper chunked prefill for multimodal models Nov 6, 2024

ywang96 added 2 commits November 6, 2024 11:37

revert llava changes

68aebb3

Signed-off-by: Roger Wang <[email protected]>

update

d918b0f

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into chunked-prefill-vlm

afa1abf

mergify bot added the needs-rebase label Nov 9, 2024

ywang96 added 2 commits November 8, 2024 21:08

flatten

d386818

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into chunked-prefill-vlm

c8b0dd9

Signed-off-by: Roger Wang <[email protected]>

mergify bot removed the needs-rebase label Nov 9, 2024

ywang96 added 4 commits November 8, 2024 21:49

fix

816398c

Signed-off-by: Roger Wang <[email protected]>

fix v1 internvl

01403f9

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into chunked-prefill-vlm

9a5adfc

Fix

1995472

Signed-off-by: Roger Wang <[email protected]>

This was referenced Nov 13, 2024

[Bug]: vllm serve works incorrect for (some) Vision LM models #10286

Closed

[Feature]: Chunked prefill for multimodal models #10290

Open

mergify bot added the needs-rebase label Nov 13, 2024

Merge branch 'main' into chunked-prefill-vlm

f8396f5

Signed-off-by: Roger Wang <[email protected]>

mergify bot removed the needs-rebase label Nov 13, 2024

remove ruff

e83892e

Signed-off-by: Roger Wang <[email protected]>

ywang96 closed this Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][VLM] Enable proper chunked prefill for multimodal models #9950

[V1][VLM] Enable proper chunked prefill for multimodal models #9950

ywang96 commented Nov 2, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 2, 2024

ywang96 commented Nov 7, 2024

mergify bot commented Nov 9, 2024

mergify bot commented Nov 13, 2024

ywang96 commented Nov 27, 2024

[V1][VLM] Enable proper chunked prefill for multimodal models #9950

[V1][VLM] Enable proper chunked prefill for multimodal models #9950

Conversation

ywang96 commented Nov 2, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 2, 2024

ywang96 commented Nov 7, 2024

mergify bot commented Nov 9, 2024

mergify bot commented Nov 13, 2024

ywang96 commented Nov 27, 2024

ywang96 commented Nov 2, 2024 •

edited by github-actions bot

Loading