[VLM] Support caching in merged multi-modal processor (outside the HF calling loop) #11396

DarkLight1337 · 2024-12-21T08:00:54Z

This PR moves the caching-related logic (e.g. checking if image is in the cache, and merging it with the processed outputs) outside of the main loop of applying the HF processor.

To enable this, the merged multi-modal processor for each model needs to define the "schema" of the output MultiModalKwargs (in BaseMultiModelProcessor._get_mm_field_tags), which defines how to obtain the kwargs corresponding to that item (MultiModalFieldTag.get), and also how to merge the kwargs of newly processed items with the cached results (MultiModalFieldTag.reduce).

Alternative implementation of #11341

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2024-12-21T08:01:04Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 28 commits December 19, 2024 17:42

Refactor multi-modal processor to support caching

faa9b84

Signed-off-by: DarkLight1337 <[email protected]>

Clean up

9711a15

Signed-off-by: DarkLight1337 <[email protected]>

Fix cached result being mutated

29e3fcd

Signed-off-by: DarkLight1337 <[email protected]>

Rename

ab64e85

Signed-off-by: DarkLight1337 <[email protected]>

Fix docs

81215a2

Signed-off-by: DarkLight1337 <[email protected]>

Fix a typo

cf52b3b

Signed-off-by: DarkLight1337 <[email protected]>

Fix unhandled sampling rate in initialization

a4a8eb9

Signed-off-by: DarkLight1337 <[email protected]>

format

c48f7c5

Signed-off-by: DarkLight1337 <[email protected]>

Change the delimiter

b84ff42

Signed-off-by: DarkLight1337 <[email protected]>

Fix extra dimension

c3f1bde

Signed-off-by: DarkLight1337 <[email protected]>

Update

32e5197

Signed-off-by: DarkLight1337 <[email protected]>

Use the inner processor to enable fine-grained caching

7264d4e

Signed-off-by: DarkLight1337 <[email protected]>

Make the cache optional

02ea829

Signed-off-by: DarkLight1337 <[email protected]>

Fix invalid kwargs being passed to tokenizer

b981a9d

Signed-off-by: DarkLight1337 <[email protected]>

Fix Phi3V prompt replacement

5dde7d0

Signed-off-by: DarkLight1337 <[email protected]>

Refine

7339ab8

Signed-off-by: DarkLight1337 <[email protected]>

Enable fine-grained caching for audio models

509411d

Signed-off-by: DarkLight1337 <[email protected]>

Add fallback

c0454f5

Signed-off-by: DarkLight1337 <[email protected]>

Fix typo

d50ef03

Signed-off-by: DarkLight1337 <[email protected]>

Fix video processor for Qwen2-VL

81f7d61

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into mm-processor-cache

13eede3

Fix a bunch of type errors

affbc5c

Signed-off-by: DarkLight1337 <[email protected]>

Fix qwen2-vl

b4ddfb1

Signed-off-by: DarkLight1337 <[email protected]>

Fix

4b3db32

Signed-off-by: DarkLight1337 <[email protected]>

Simplify Pixtral-HF

dafbc7f

Signed-off-by: DarkLight1337 <[email protected]>

Cleanup

38aaff8

Signed-off-by: DarkLight1337 <[email protected]>

Fix Pixtral-HF

5fcb5d6

Signed-off-by: DarkLight1337 <[email protected]>

Enable caching outside the processing loop

f86e148

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from ywang96 December 21, 2024 08:00

mergify bot added the documentation Improvements or additions to documentation label Dec 21, 2024

DarkLight1337 added 3 commits December 21, 2024 08:20

Make debugging easier

337f0d2

Signed-off-by: DarkLight1337 <[email protected]>

Update

c01d38a

Signed-off-by: DarkLight1337 <[email protected]>

Fix ultravox

84f02fb

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 marked this pull request as ready for review December 21, 2024 10:14

DarkLight1337 changed the title ~~[VLM] Enable caching outside the processing loop~~ [VLM] Enable caching in merged multi-modal processor (outside the HF calling loop) Dec 21, 2024

DarkLight1337 changed the title ~~[VLM] Enable caching in merged multi-modal processor (outside the HF calling loop)~~ [VLM] Support caching in merged multi-modal processor (outside the HF calling loop) Dec 21, 2024

Revert some unnecessary changes

9f417c2

Signed-off-by: DarkLight1337 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Support caching in merged multi-modal processor (outside the HF calling loop) #11396

[VLM] Support caching in merged multi-modal processor (outside the HF calling loop) #11396

DarkLight1337 commented Dec 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 21, 2024

[VLM] Support caching in merged multi-modal processor (outside the HF calling loop) #11396

Are you sure you want to change the base?

[VLM] Support caching in merged multi-modal processor (outside the HF calling loop) #11396

Conversation

DarkLight1337 commented Dec 21, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 21, 2024

DarkLight1337 commented Dec 21, 2024 •

edited by github-actions bot

Loading