34b vs 13b vs 7b: Impact of LLM size vs vision encoder size? #1135
Unanswered
matthiasgeihs
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've tested with the different model sizes a bit.
It seems like 34b is indeed significantly better in image recognition tasks.
I'd like to understand: How does the language model size affect the performance of the visual recognition given that the vision encoder always has the same size? What are the most impactful parts of the architecture on recognition performance?
Beta Was this translation helpful? Give feedback.
All reactions