-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VLM] Support caching in merged multi-modal processor #11341
base: main
Are you sure you want to change the base?
[VLM] Support caching in merged multi-modal processor #11341
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
vllm/multimodal/processing.py
Outdated
def _iter_bytes_to_hash(self, key: str, obj: object) -> Iterable[bytes]: | ||
# Recursive cases | ||
if isinstance(obj, (list, tuple)): | ||
for elem in obj: | ||
yield from self._iter_bytes_to_hash(key, elem) | ||
return | ||
if isinstance(obj, dict): | ||
for k, v in obj.items(): | ||
yield from self._iter_bytes_to_hash(f"{key}.{k}", v) | ||
return | ||
|
||
# Simple cases | ||
if isinstance(obj, str): | ||
yield key.encode("utf-8") | ||
yield obj.encode("utf-8") | ||
return | ||
if isinstance(obj, bytes): | ||
yield key.encode("utf-8") | ||
yield obj | ||
return | ||
if isinstance(obj, Image): | ||
yield key.encode("utf-8") | ||
yield obj.tobytes() | ||
return | ||
|
||
# Convertible to NumPy arrays | ||
if isinstance(obj, torch.Tensor): | ||
obj = obj.numpy() | ||
if isinstance(obj, (int, float)): | ||
obj = np.array(obj) | ||
if isinstance(obj, np.ndarray): | ||
yield key.encode("utf-8") | ||
yield obj.tobytes() | ||
return | ||
|
||
msg = f"Unable to hash object of type {type(obj)}" | ||
raise NotImplementedError(msg) | ||
|
||
def _hash_kwargs(self, **kwargs: object) -> str: | ||
hasher = blake3() | ||
|
||
for k, v in kwargs.items(): | ||
for item_bytes in self._iter_bytes_to_hash(k, v): | ||
hasher.update(item_bytes) | ||
|
||
return hasher.hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about unintentional hash collisions. Is there a better way to do this?
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
b2dac49
to
5dde7d0
Compare
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
V1 multi-modal cache is currently incompatible with the merged multi-modal processor. To mitigate the performance hit, this PR adds a cache inside the merged multi-modal processor.
Note: Even with this PR, none of the models that currently use merged multi-modal processor actually support fine-grained caching because their HF processors all require text inputs.Now supported by using the inner modality-specific processor.