Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLava-Next example is broken #31713

Open
1 of 4 tasks
isidentical opened this issue Jun 29, 2024 · 2 comments
Open
1 of 4 tasks

LLava-Next example is broken #31713

isidentical opened this issue Jun 29, 2024 · 2 comments

Comments

@isidentical
Copy link

System Info

requirements = [
    "torch>=2.3.0",
    "torchvision",
    "torchaudio",
    "transformers @ git+https://github.com/huggingface/transformers.git@e65502951593a76844e872fee9c56b805598538a",
    "bitsandbytes",
    "accelerate",
    "sse-starlette",
    "sentencepiece",
]

Python 3.11

Who can help?

@amyeroberts

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf")

model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-34b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")

# prepare image and text prompt, using the appropriate prompt template
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|><|im_start|>assistant\n"

inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")

# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)

print(processor.decode(output[0], skip_special_tokens=True))

Expected behavior

Not this:

  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/starlette/routing.py", line 732, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/root/.pyenv/versions/3.11.3/lib/python3.11/contextlib.py", line 204, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicero/projects/fal-isolate-cloud/.venv/lib/python3.11/site-packages/fal/app.py", line 183, in lifespan
  File "/home/cicero/projects/fal-isolate-cloud/.venv/lib/python3.11/site-packages/fal/app.py", line 31, in _call_any_fn
  File "registry/text/llava_next.py", line 102, in setup
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/generation/utils.py", line 1914, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/generation/utils.py", line 2651, in _sample
    outputs = self(
              ^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 806, in forward
    inputs_embeds, attention_mask, position_ids, labels, _ = self._merge_input_ids_with_image_features(
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/isolate/virtualenv/0092dee401903eee4639c7db272232c3de3029ab800f663163646533f75538c1/lib/python3.11/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 543, in _merge_input_ids_with_image_features
    raise ValueError(
ValueError: Number of image tokens in input_ids (0) different from num_images (1).
@qubvel
Copy link
Member

qubvel commented Jul 1, 2024

cc @NielsRogge

@NielsRogge
Copy link
Contributor

NielsRogge commented Jul 1, 2024

Thanks for reporting, can reproduce the issue. The cause seems to be this commit: https://huggingface.co/llava-hf/llava-v1.6-34b-hf/commit/0222a5505d403cb64dfb92735abf37bf0d38de3e, which updates the image token index to 64003 instead of 64000. The model expects 64000 as seen here. Pinging @zucchini-nlp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants