Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix llava/llava next issue when working with AutoProcessor #1674

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sywangyi
Copy link
Collaborator

crash if running following script

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq

from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi

adapt_transformers_to_gaudi()


model_id = "llava-hf/llava-1.5-7b-hf"
#model_id = "llava-hf/llama3-llava-next-8b-hf"
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)
model = model.to("hpu")
model.generation_config.pad_token_id = model.generation_config.eos_token_id
processor = AutoProcessor.from_pretrained(model_id, padding_side="left")

# Define a chat histiry and use `apply_chat_template` to get correctly formatted prompt
# Each value in "content" has to be a list of dicts with types ("text", "image")
conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": "What are these?"},
          {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors='pt')
for t in inputs:
    if torch.is_tensor(inputs[t]):
        inputs[t] = inputs[t].to("hpu")
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

@sywangyi sywangyi requested a review from regisss as a code owner December 30, 2024 08:10
@sywangyi
Copy link
Collaborator Author

for llava:
Traceback (most recent call last):
File "/workspace/wangyi/optimum-habana/test.py", line 42, in
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1468, in generate
result = self._sample(
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2449, in _sample
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/models/llava/modeling_llava.py", line 183, in forward
outputs = self.language_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1847, in _call_impl
return inner()
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1793, in inner
result = forward_call(*args, **kwargs)
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1371, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1847, in _call_impl
return inner()
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1793, in inner
result = forward_call(*args, **kwargs)
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1226, in forward
htcore.mark_step()
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/utils/internal.py", line 36, in lazy_wrapper
func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/step_closure.py", line 71, in mark_step
htcore._mark_step(device_str, sync)
RuntimeError: The expanded size of the tensor (331776) must match the existing size (576) at non-singleton dimension 0. Target sizes: [331776, 4096]. Tensor sizes: [576, 4096]

@sywangyi
Copy link
Collaborator Author

for llava_next:
Traceback (most recent call last):
File "/workspace/wangyi/optimum-habana/test.py", line 43, in
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1468, in generate
result = self._sample(
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2440, in _sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/models/llava_next/modeling_llava_next.py", line 342, in prepare_inputs_for_generation
self._merge_input_ids_with_image_features(
File "/workspace/wangyi/optimum-habana/optimum/habana/transformers/models/llava_next/modeling_llava_next.py", line 213, in _merge_input_ids_with_image_features
raise ValueError(
ValueError: The input provided to the model are wrong. The number of image tokens is 2340 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sywangyi
Copy link
Collaborator Author

@sywangyi
Copy link
Collaborator Author

@lkk12014402 @yuanwu2017 please help review

legacy_processing = (
(input_ids == self.config.image_token_index).sum(1).max() < self.config.image_seq_length
) or ((input_ids.shape[-1] == 1 if token_idx is None else token_idx == 1) and pixel_values is not None)
if token_idx is not None and pixel_values is not None and legacy_processing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it affect static shapes optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants