[V1] Use input_ids as input for text-only models #11032

WoosukKwon · 2024-12-09T21:09:21Z

Currently, the model runner in V1 always uses input_embeds rather than input_ids for the compatibility with multi-modal models. However, this excludes the embedding layer from the CUDA graph, and thus causes slight performance regression (and larger performance regression when TP > 1). This PR addresses this by using the input_ids for text-only models while keeping using input_embeds for multi-modal models.

Signed-off-by: Woosuk Kwon <[email protected]>

github-actions · 2024-12-09T21:09:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon · 2024-12-11T18:49:08Z

Chatted offline with @ywang96, while we also need to support input embeddings for text-only models, this PR is good to go since the API is still not ready for V1.

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon added 2 commits December 9, 2024 13:06

[V1] Use input_ids as input for text-only models

b67c588

Signed-off-by: Woosuk Kwon <[email protected]>

minor

bd87a19

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon requested review from robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners December 9, 2024 21:09

WoosukKwon added 3 commits December 9, 2024 13:24

minor

64347e9

Signed-off-by: Woosuk Kwon <[email protected]>

minor

c72db1f

Signed-off-by: Woosuk Kwon <[email protected]>

int32

6b976ec

Signed-off-by: Woosuk Kwon <[email protected]>

ywang96 self-assigned this Dec 10, 2024

WoosukKwon added 4 commits December 9, 2024 19:46

Merge branch 'main' into v1-embedding-cg

db0dcd2

Merge branch 'main' into v1-embedding-cg

cef5fcd

mionr

e2baff8

Signed-off-by: Woosuk Kwon <[email protected]>

minor

426d3f6

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon merged commit d643c2a into main Dec 11, 2024
22 of 25 checks passed

WoosukKwon deleted the v1-embedding-cg branch December 11, 2024 18:49

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[V1] Use input_ids as input for text-only models (vllm-project#11032)

28fc47d

Signed-off-by: Woosuk Kwon <[email protected]>

ywang96 mentioned this pull request Dec 19, 2024

[Core] generate from input embeds #6869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1] Use input_ids as input for text-only models #11032

[V1] Use input_ids as input for text-only models #11032

Uh oh!

WoosukKwon commented Dec 9, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

WoosukKwon commented Dec 11, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[V1] Use input_ids as input for text-only models #11032

[V1] Use input_ids as input for text-only models #11032

Uh oh!

Conversation

WoosukKwon commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

WoosukKwon commented Dec 11, 2024

Uh oh!

Uh oh!

Uh oh!

WoosukKwon commented Dec 9, 2024 •

edited

Loading