[VLM] Support caching in merged multi-modal processor #11341

DarkLight1337 · 2024-12-19T17:43:34Z

V1 multi-modal cache is currently incompatible with the merged multi-modal processor. To mitigate the performance hit, this PR adds a cache inside the merged multi-modal processor.

~~Note: Even with this PR, none of the models that currently use merged multi-modal processor actually support fine-grained caching because their HF processors all require text inputs.~~ Now supported by using the inner modality-specific processor.

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2024-12-19T17:43:47Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2024-12-19T18:19:04Z

vllm/multimodal/processing.py

+    def _iter_bytes_to_hash(self, key: str, obj: object) -> Iterable[bytes]:
+        # Recursive cases
+        if isinstance(obj, (list, tuple)):
+            for elem in obj:
+                yield from self._iter_bytes_to_hash(key, elem)
+            return
+        if isinstance(obj, dict):
+            for k, v in obj.items():
+                yield from self._iter_bytes_to_hash(f"{key}.{k}", v)
+            return
+
+        # Simple cases
+        if isinstance(obj, str):
+            yield key.encode("utf-8")
+            yield obj.encode("utf-8")
+            return
+        if isinstance(obj, bytes):
+            yield key.encode("utf-8")
+            yield obj
+            return
+        if isinstance(obj, Image):
+            yield key.encode("utf-8")
+            yield obj.tobytes()
+            return
+
+        # Convertible to NumPy arrays
+        if isinstance(obj, torch.Tensor):
+            obj = obj.numpy()
+        if isinstance(obj, (int, float)):
+            obj = np.array(obj)
+        if isinstance(obj, np.ndarray):
+            yield key.encode("utf-8")
+            yield obj.tobytes()
+            return
+
+        msg = f"Unable to hash object of type {type(obj)}"
+        raise NotImplementedError(msg)
+
+    def _hash_kwargs(self, **kwargs: object) -> str:
+        hasher = blake3()
+
+        for k, v in kwargs.items():
+            for item_bytes in self._iter_bytes_to_hash(k, v):
+                hasher.update(item_bytes)
+
+        return hasher.hexdigest()


I'm a bit worried about unintentional hash collisions. Is there a better way to do this?

Signed-off-by: DarkLight1337 <[email protected]>

Refactor multi-modal processor to support caching

faa9b84

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from ywang96 December 19, 2024 17:43

DarkLight1337 changed the title ~~[VLM} Refactor merged multi-modal processor to support caching~~ [VLM] Refactor merged multi-modal processor to support caching Dec 19, 2024

DarkLight1337 changed the title ~~[VLM] Refactor merged multi-modal processor to support caching~~ [VLM] Support caching in merged multi-modal processor Dec 19, 2024

Clean up

9711a15

Signed-off-by: DarkLight1337 <[email protected]>

This was referenced Dec 19, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

DarkLight1337 added 2 commits December 19, 2024 18:02

Fix cached result being mutated

29e3fcd

Signed-off-by: DarkLight1337 <[email protected]>

Rename

ab64e85

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 marked this pull request as ready for review December 19, 2024 18:03

Fix docs

81215a2

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Dec 19, 2024

DarkLight1337 added 2 commits December 19, 2024 18:08

Fix a typo

cf52b3b

Signed-off-by: DarkLight1337 <[email protected]>

Fix unhandled sampling rate in initialization

a4a8eb9

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from alexm-neuralmagic December 19, 2024 18:15

DarkLight1337 added 2 commits December 19, 2024 18:17

format

c48f7c5

Signed-off-by: DarkLight1337 <[email protected]>

Change the delimiter

b84ff42

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 commented Dec 19, 2024

View reviewed changes

DarkLight1337 added 6 commits December 19, 2024 18:24

Fix extra dimension

c3f1bde

Signed-off-by: DarkLight1337 <[email protected]>

Update

32e5197

Signed-off-by: DarkLight1337 <[email protected]>

Use the inner processor to enable fine-grained caching

7264d4e

Signed-off-by: DarkLight1337 <[email protected]>

Make the cache optional

02ea829

Signed-off-by: DarkLight1337 <[email protected]>

Fix invalid kwargs being passed to tokenizer

b981a9d

Signed-off-by: DarkLight1337 <[email protected]>

Fix Phi3V prompt replacement

5dde7d0

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the mm-processor-cache branch from b2dac49 to 5dde7d0 Compare December 20, 2024 04:25

DarkLight1337 added 4 commits December 20, 2024 04:27

Refine

7339ab8

Signed-off-by: DarkLight1337 <[email protected]>

Enable fine-grained caching for audio models

509411d

Signed-off-by: DarkLight1337 <[email protected]>

Add fallback

c0454f5

Signed-off-by: DarkLight1337 <[email protected]>

Fix typo

d50ef03

Signed-off-by: DarkLight1337 <[email protected]>

Fix video processor for Qwen2-VL

81f7d61

Signed-off-by: DarkLight1337 <[email protected]>

ywang96 self-assigned this Dec 20, 2024

DarkLight1337 added 7 commits December 20, 2024 13:31

Merge branch 'main' into mm-processor-cache

13eede3

Fix a bunch of type errors

affbc5c

Signed-off-by: DarkLight1337 <[email protected]>

Fix qwen2-vl

b4ddfb1

Signed-off-by: DarkLight1337 <[email protected]>

Fix

4b3db32

Signed-off-by: DarkLight1337 <[email protected]>

Simplify Pixtral-HF

dafbc7f

Signed-off-by: DarkLight1337 <[email protected]>

Cleanup

38aaff8

Signed-off-by: DarkLight1337 <[email protected]>

Fix Pixtral-HF

5fcb5d6

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 mentioned this pull request Dec 21, 2024

[VLM] Support caching in merged multi-modal processor (outside the HF calling loop) #11396

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Support caching in merged multi-modal processor #11341

[VLM] Support caching in merged multi-modal processor #11341

DarkLight1337 commented Dec 19, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 19, 2024

DarkLight1337 Dec 19, 2024

[VLM] Support caching in merged multi-modal processor #11341

Are you sure you want to change the base?

[VLM] Support caching in merged multi-modal processor #11341

Conversation

DarkLight1337 commented Dec 19, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 19, 2024

DarkLight1337 Dec 19, 2024

Choose a reason for hiding this comment

DarkLight1337 commented Dec 19, 2024 •

edited by github-actions bot

Loading