Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered #220

Open
hessaAlawwad opened this issue Nov 14, 2024 · 3 comments
Open

RuntimeError: CUDA error: device-side assert triggered #220

hessaAlawwad opened this issue Nov 14, 2024 · 3 comments

Comments

@hessaAlawwad
Copy link

Hello,
I am trying the following code to test sending multiple images:

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

# model = MllamaForConditionalGeneration.from_pretrained(
#     model_id,
#     torch_dtype=torch.bfloat16,
#     device_map="auto",
# )
# processor = AutoProcessor.from_pretrained(model_id)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "},
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
    [image,image],
    input_text,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))

and got the error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-23-5e73b30f8d1d>](https://localhost:8080/#) in <cell line: 34>()
     32 ).to(model.device)
     33 
---> 34 output = model.generate(**inputs, max_new_tokens=30)
     35 print(processor.decode(output[0]))

3 frames
[/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in _has_unfinished_sequences(self, this_peer_finished, synced_gpus, device, cur_len, max_length)
   2411                 if this_peer_finished_flag.item() == 0.0:
   2412                     return False
-> 2413             elif this_peer_finished:
   2414                 return False
   2415             return True

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

how can I solve it?

@ashwinb
Copy link
Contributor

ashwinb commented Nov 14, 2024

cc @init27, this is a huggingface specific issue.

@init27
Copy link

init27 commented Nov 14, 2024

Thanks Ashwin!
@hessaAlawwad-this is by design, for the current model, we only recommend chatting with one image in a session.

@Sosycs
Copy link

Sosycs commented Nov 15, 2024

@init27 thank you sir, but when you say "we only recommend" do you mean it is possible to chat with multiple images?
because if so why would we get an error? it should pass and give weak results no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants