-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] VLM2Vec, the first multimodal embedding model in vLLM #9303
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
73f8571
to
cd11ccb
Compare
Okay, I'd be happy to try supporting LoRA. |
b3aaa43
to
3bdc9d5
Compare
3bdc9d5
to
4f6b1a8
Compare
@@ -461,3 +463,50 @@ def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]): | |||
if self.config.tie_word_embeddings else None), | |||
) | |||
loader.load_weights(weights) | |||
|
|||
|
|||
class Gemma2EmbeddingModel(nn.Module, SupportsPP): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm preemptively moving them into the same file to be consistent with the upcoming BERT PR. (#9056)
Nice work! There were some typos in our paper. We actually used last token instead eos token representation. I saw that you already used the last token as the representation, which is the correct implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Especially the implementation correctness is checked by model vendors :)
Ooops, seems that the new added vision embedding test didn't include in test-pipeline: vllm/.buildkite/test-pipeline.yaml Lines 337 to 349 in 7abba39
We might need to open another PR to include it. |
Nice catch, I have opened #9406 |
…roject#9303) Signed-off-by: charlifu <[email protected]>
…roject#9303) Signed-off-by: Vinay Damodaran <[email protected]>
What will it take / when would it be expected for online engine support for this? ^^ |
Maybe sometime in the next two weeks. |
…roject#9303) Signed-off-by: Alvant <[email protected]>
…roject#9303) Signed-off-by: Amit Garg <[email protected]>
…roject#9303) Signed-off-by: qishuai <[email protected]>
Quick heads-up that it's done now! |
When will the LoRA version be online? Actually, the LoRA version works better. |
Currently, vLLM only supports LoRA for the language backbone of VLMs - some re-arch work is necessary to extend this to the vision encoder. @jeejeelee do you have a timeframe regarding this? |
I guess you meant "vLLM doesn't support LoRA" instead of "vLLM only support LoRA"? |
I mean that vLLM supports LoRA on the language backbone, but not LoRA on the vision encoder of VLMs. |
Got it. Thanks for the explanation! |
…roject#9303) Signed-off-by: Sumit Dubey <[email protected]>
…roject#9303) Signed-off-by: Maxime Fournioux <[email protected]>
Support VLM2Vec embedding model by TIGER-Lab.
This is a low-hanging fruit as the model architecture is exactly the same as Phi3V.
Future works, in order of priority:
@jeejeelee are you available to help provide LoRA support for this model after this PR?