[VLM] Merged multi-modal processor for Pixtral #12211

Flechman · 2025-01-20T09:25:12Z

This PR aims at implementing the merged multi-modal processor for Pixtral as an effort to contribute to the V1 re-arch for multi-modal models.

Signed-off-by: remi <[email protected]>

github-actions · 2025-01-20T09:25:23Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

vllm/model_executor/models/pixtral.py

Signed-off-by: remi <[email protected]>

DarkLight1337 · 2025-02-05T08:11:18Z

#12767 should make it easier to pass the image token ID

mergify · 2025-02-13T03:53:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: remi <[email protected]>

DarkLight1337 · 2025-03-08T06:16:32Z

Any update on this?

mergify · 2025-03-08T06:21:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: remi <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-03-10T15:23:42Z

Thanks for updating! I'm going to push some code improvements / fixes to get this merged sooner.

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-03-10T18:53:47Z

I've updated the processor to more closely follow the design of the other models. However there is still a mismatch in the placeholder tokens, would be great if anyone could help debug this in the meantime!

DarkLight1337 · 2025-03-10T18:54:27Z

vllm/transformers_utils/tokenizer.py

-    return tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
+    if skip_special_tokens is not None:
+        return tokenizer.decode(token_ids,
+                                skip_special_tokens=skip_special_tokens)


This change lets us avoid passing skip_special_tokens=False to MistralTokenizer which isn't allowed.

Signed-off-by: DarkLight1337 <[email protected]>

Adjustment first version

fbe6a9d

Signed-off-by: remi <[email protected]>

This was referenced Jan 20, 2025

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

mgoin reviewed Jan 20, 2025

View reviewed changes

vllm/model_executor/models/pixtral.py Outdated Show resolved Hide resolved

ywang96 assigned ywang96 and DarkLight1337 Jan 20, 2025

Flechman added 2 commits January 22, 2025 14:59

Merge with main

46c142f

Revert changes

4af1716

Signed-off-by: remi <[email protected]>

Flechman force-pushed the pixtral-mm-processor branch from 41c423a to 4af1716 Compare January 26, 2025 12:19

Flechman added 5 commits January 26, 2025 12:32

Add pixtral dummy inputs builder

8a75f3a

Signed-off-by: remi <[email protected]>

Fix naming

2e346d3

Signed-off-by: remi <[email protected]>

HF processor not supported

c9c082b

Signed-off-by: remi <[email protected]>

Add tokenizer mode

869a620

Signed-off-by: remi <[email protected]>

Override pixtral processor apply

a6392cb

Signed-off-by: remi <[email protected]>

mergify bot added the needs-rebase label Feb 13, 2025

Merge with main

c1b78f4

Signed-off-by: remi <[email protected]>

mergify bot removed the needs-rebase label Feb 14, 2025

mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 8, 2025

mergify bot added the needs-rebase label Mar 8, 2025

Merge with main

9d70fba

Signed-off-by: remi <[email protected]>

mergify bot removed the needs-rebase label Mar 9, 2025

Flechman added 2 commits March 9, 2025 21:57

Add caching mechanism

cafe731

Signed-off-by: remi <[email protected]>

Add tokenization

4c8f915

Signed-off-by: remi <[email protected]>

Flechman marked this pull request as ready for review March 9, 2025 22:51

Cleanup previous processor

c1bef45

Signed-off-by: remi <[email protected]>

Update based on latest PRs

d5fd5cd

Signed-off-by: DarkLight1337 <[email protected]>

Draft HF-compatible processor

9b0e436

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from DarkLight1337 and ywang96 as code owners March 10, 2025 18:50

mergify bot added the documentation Improvements or additions to documentation label Mar 10, 2025

DarkLight1337 reviewed Mar 10, 2025

View reviewed changes

Add sanity check

a8d00e8

Signed-off-by: DarkLight1337 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Merged multi-modal processor for Pixtral #12211

[VLM] Merged multi-modal processor for Pixtral #12211

Flechman commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

DarkLight1337 commented Feb 5, 2025

mergify bot commented Feb 13, 2025

DarkLight1337 commented Mar 8, 2025

mergify bot commented Mar 8, 2025

DarkLight1337 commented Mar 10, 2025

DarkLight1337 commented Mar 10, 2025

DarkLight1337 Mar 10, 2025 •

edited

Loading

[VLM] Merged multi-modal processor for Pixtral #12211

Are you sure you want to change the base?

[VLM] Merged multi-modal processor for Pixtral #12211

Conversation

Flechman commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

DarkLight1337 commented Feb 5, 2025

mergify bot commented Feb 13, 2025

DarkLight1337 commented Mar 8, 2025

mergify bot commented Mar 8, 2025

DarkLight1337 commented Mar 10, 2025

DarkLight1337 commented Mar 10, 2025

DarkLight1337 Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 Mar 10, 2025 •

edited

Loading