Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] Initial support of multimodal models for V1 re-arch #10699

Merged
merged 31 commits into from
Dec 8, 2024
Merged
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
246d75b
internvl
ywang96 Nov 27, 2024
2a081bb
fix token id
ywang96 Nov 27, 2024
e4d6bb2
Merge branch 'vllm-project:main' into v1-initial
ywang96 Nov 28, 2024
94d66cc
Pixtral
ywang96 Nov 30, 2024
79f24c6
use special ids
ywang96 Nov 30, 2024
7a88433
comment
ywang96 Nov 30, 2024
af1dbab
cleanup for pixtral
ywang96 Nov 30, 2024
39dd4f2
Merge branch 'vllm-project:main' into v1-initial
ywang96 Nov 30, 2024
6d0df5a
qwen2vl
ywang96 Dec 1, 2024
124b0c1
Merge branch 'vllm-project:main' into v1-initial
ywang96 Dec 2, 2024
8c4da46
molmo
ywang96 Dec 2, 2024
3e3a346
minor changes on interfaces
ywang96 Dec 2, 2024
1c50613
typo
ywang96 Dec 2, 2024
6d8ddff
pad
ywang96 Dec 2, 2024
7ddf7d9
Merge branch 'vllm-project:main' into v1-initial
ywang96 Dec 2, 2024
f1fa769
remove print
ywang96 Dec 3, 2024
ee8e0ae
Merge branch 'vllm-project:main' into v1-initial
ywang96 Dec 3, 2024
319e689
Merge branch 'vllm-project:main' into v1-initial
ywang96 Dec 4, 2024
77256d9
change check order
ywang96 Dec 4, 2024
bdd8da6
Merge branch 'main' into v1-initial
ywang96 Dec 5, 2024
e32efd5
Merge branch 'vllm-project:main' into v1-initial
ywang96 Dec 6, 2024
0176b7b
molmo
ywang96 Dec 6, 2024
69f4e5f
fix launch args
ywang96 Dec 6, 2024
8b7e746
fix qwen2-vl
ywang96 Dec 6, 2024
bb15b01
typing
ywang96 Dec 6, 2024
610e662
add documentation
ywang96 Dec 6, 2024
2b5fdd7
minor fix
ywang96 Dec 6, 2024
a5a38dd
typehint
ywang96 Dec 6, 2024
fbf9cd0
Merge branch 'main' into v1-initial
ywang96 Dec 7, 2024
8d1d80e
iterate
ywang96 Dec 8, 2024
4a79255
revert changes in qwen2vl
ywang96 Dec 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions vllm/model_executor/models/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,23 @@ def merge_multimodal_embeddings(
Merge ``multimodal_embeddings`` into ``inputs_embeds`` by overwriting the
positions in ``inputs_embeds`` corresponding to placeholder tokens in
``input_ids``.

``placeholder_token_id`` can be a list of token ids (e.g, token ids
ywang96 marked this conversation as resolved.
Show resolved Hide resolved
of img_start, img_break, and img_end tokens) when needed: This means
the order of these tokens in the ``input_ids`` MUST MATCH the order of
their embeddings in ``multimodal_embeddings`` since we need to
slice-merge instead of individually scattering.

For example, if input_ids is "TTTTTSIIIBIIIBIIIETTT", where
- T is text token
- S is image start token
- I is image embedding token
- B is image break token
- E is image end token.

Then the image embeddings (that correspond to I's) from vision encoder
must be padded with embeddings of S, B, and E in the same order of
input_ids for a correct embedding merge.

Note:
This updates ``inputs_embeds`` in place.
Expand Down
Loading