Llama3.2: Allow batch to have #35937

maximilianmordig · 2025-01-28T13:56:56Z

Feature request

Currently, llama3.2 requires either no images per batch or each example have at least one image. Is there some easy workaround (apart from feeding dummy images which is computationally expensive) to allow some examples to have images and other examples to have no images?

Current error message:
"If a batch of text is provided, there should be either no images or at least one image per sample"

from

transformers/src/transformers/models/mllama/processing_mllama.py

Line 303 in 3f860db

    
           "If a batch of text is provided, there should be either no images or at least one image per sample"

Motivation

Not all examples may have images, but some do.

Your contribution

Could possibly submit a PR

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2025-01-28T14:05:29Z

Hey @maximilianmordig !

While it is possible to allow images only in some of the text samples for the processing, that will hurt model performance. The model chooses either to do cross-attn or not depending on image presence, and if we have text-only inputs they are not expected to do any cross attention. AFAIK that was the reason behind it

maximilianmordig added the Feature request Request for a new feature label Jan 28, 2025

zucchini-nlp added the VLM label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3.2: Allow batch to have #35937

Llama3.2: Allow batch to have #35937

maximilianmordig commented Jan 28, 2025

zucchini-nlp commented Jan 28, 2025

Llama3.2: Allow batch to have #35937

Llama3.2: Allow batch to have #35937

Comments

maximilianmordig commented Jan 28, 2025

Feature request

Motivation

Your contribution

zucchini-nlp commented Jan 28, 2025