Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I get the image-text fusion feature embeddings in this model only by a image? #15

Closed
Linn0910 opened this issue Dec 23, 2024 · 1 comment

Comments

@Linn0910
Copy link

Hello! Thanks for your work! Since llava has a generation ability, so I want to konw can this model get the image-text fusion feature embeddings in this model only by a image.
Thanks for your time and help!
Best regards!

@kongds
Copy link
Owner

kongds commented Dec 26, 2024

Thank you for your interest in our work.

Did you mean embedding text from an image, such as OCR?
We have explored similar approaches in our paper, such as rendering captions onto images in Section 4.2. However, we have not attempted to represent both the image and text using a single image.

@kongds kongds closed this as completed Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants