Can I get the image-text fusion feature embeddings in this model only by a image? #15

Linn0910 · 2024-12-23T12:52:12Z

Hello! Thanks for your work! Since llava has a generation ability, so I want to konw can this model get the image-text fusion feature embeddings in this model only by a image.
Thanks for your time and help!
Best regards!

kongds · 2024-12-26T17:07:54Z

Thank you for your interest in our work.

Did you mean embedding text from an image, such as OCR?
We have explored similar approaches in our paper, such as rendering captions onto images in Section 4.2. However, we have not attempted to represent both the image and text using a single image.

kongds closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I get the image-text fusion feature embeddings in this model only by a image? #15

Can I get the image-text fusion feature embeddings in this model only by a image? #15

Linn0910 commented Dec 23, 2024

kongds commented Dec 26, 2024

Can I get the image-text fusion feature embeddings in this model only by a image? #15

Can I get the image-text fusion feature embeddings in this model only by a image? #15

Comments

Linn0910 commented Dec 23, 2024

kongds commented Dec 26, 2024