LLava-Next example is broken #31713

isidentical · 2024-06-29T18:53:46Z

System Info

requirements = [
    "torch>=2.3.0",
    "torchvision",
    "torchaudio",
    "transformers @ git+https://github.com/huggingface/transformers.git@e65502951593a76844e872fee9c56b805598538a",
    "bitsandbytes",
    "accelerate",
    "sse-starlette",
    "sentencepiece",
]

Python 3.11

Who can help?

@amyeroberts

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf")

model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-34b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")

# prepare image and text prompt, using the appropriate prompt template
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|><|im_start|>assistant\n"

inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")

# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)

print(processor.decode(output[0], skip_special_tokens=True))

Expected behavior

Not this:

  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/starlette/routing.py", line 732, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/root/.pyenv/versions/3.11.3/lib/python3.11/contextlib.py", line 204, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicero/projects/fal-isolate-cloud/.venv/lib/python3.11/site-packages/fal/app.py", line 183, in lifespan
  File "/home/cicero/projects/fal-isolate-cloud/.venv/lib/python3.11/site-packages/fal/app.py", line 31, in _call_any_fn
  File "registry/text/llava_next.py", line 102, in setup
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/generation/utils.py", line 1914, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/generation/utils.py", line 2651, in _sample
    outputs = self(
              ^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 806, in forward
    inputs_embeds, attention_mask, position_ids, labels, _ = self._merge_input_ids_with_image_features(
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 543, in _merge_input_ids_with_image_features
    raise ValueError(
ValueError: Number of image tokens in input_ids (0) different from num_images (1).

The text was updated successfully, but these errors were encountered:

qubvel · 2024-07-01T14:41:07Z

cc @NielsRogge

NielsRogge · 2024-07-01T19:30:56Z

Thanks for reporting, can reproduce the issue. The cause seems to be this commit: https://huggingface.co/llava-hf/llava-v1.6-34b-hf/commit/0222a5505d403cb64dfb92735abf37bf0d38de3e, which updates the image token index to 64003 instead of 64000. The model expects 64000 as seen here. Pinging @zucchini-nlp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLava-Next example is broken #31713

LLava-Next example is broken #31713

isidentical commented Jun 29, 2024

qubvel commented Jul 1, 2024

NielsRogge commented Jul 1, 2024 •

edited

Loading

LLava-Next example is broken #31713

LLava-Next example is broken #31713

Comments

isidentical commented Jun 29, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

qubvel commented Jul 1, 2024

NielsRogge commented Jul 1, 2024 • edited Loading

NielsRogge commented Jul 1, 2024 •

edited

Loading