Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3.2: Allow batch to have #35937

Open
maximilianmordig opened this issue Jan 28, 2025 · 1 comment
Open

Llama3.2: Allow batch to have #35937

maximilianmordig opened this issue Jan 28, 2025 · 1 comment
Labels
Feature request Request for a new feature VLM

Comments

@maximilianmordig
Copy link

Feature request

Currently, llama3.2 requires either no images per batch or each example have at least one image. Is there some easy workaround (apart from feeding dummy images which is computationally expensive) to allow some examples to have images and other examples to have no images?

Current error message:
"If a batch of text is provided, there should be either no images or at least one image per sample"

from

"If a batch of text is provided, there should be either no images or at least one image per sample"

Motivation

Not all examples may have images, but some do.

Your contribution

Could possibly submit a PR

@maximilianmordig maximilianmordig added the Feature request Request for a new feature label Jan 28, 2025
@zucchini-nlp
Copy link
Member

Hey @maximilianmordig !

While it is possible to allow images only in some of the text samples for the processing, that will hurt model performance. The model chooses either to do cross-attn or not depending on image presence, and if we have text-only inputs they are not expected to do any cross attention. AFAIK that was the reason behind it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature VLM
Projects
None yet
Development

No branches or pull requests

2 participants