You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, llama3.2 requires either no images per batch or each example have at least one image. Is there some easy workaround (apart from feeding dummy images which is computationally expensive) to allow some examples to have images and other examples to have no images?
Current error message: "If a batch of text is provided, there should be either no images or at least one image per sample"
While it is possible to allow images only in some of the text samples for the processing, that will hurt model performance. The model chooses either to do cross-attn or not depending on image presence, and if we have text-only inputs they are not expected to do any cross attention. AFAIK that was the reason behind it
Feature request
Currently, llama3.2 requires either no images per batch or each example have at least one image. Is there some easy workaround (apart from feeding dummy images which is computationally expensive) to allow some examples to have images and other examples to have no images?
Current error message:
"If a batch of text is provided, there should be either no images or at least one image per sample"
from
transformers/src/transformers/models/mllama/processing_mllama.py
Line 303 in 3f860db
Motivation
Not all examples may have images, but some do.
Your contribution
Could possibly submit a PR
The text was updated successfully, but these errors were encountered: