Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] VLM2Vec, the first multimodal embedding model in vLLM #9303

Merged
merged 13 commits into from
Oct 16, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Oct 12, 2024

Support VLM2Vec embedding model by TIGER-Lab.

This is a low-hanging fruit as the model architecture is exactly the same as Phi3V.

Future works, in order of priority:

  1. Add CLI option to specify whether to use a model for generation or embedding so we don't have to hardcode the model name when the same model architecture can be used for both.
  2. Add multimodal embedding API for OpenAI-compatible server
  3. More multimodal embedding models, e.g. E5-V which can be similarly supported with our existing LLaVA-NeXT implementation

@jeejeelee are you available to help provide LoRA support for this model after this PR?

@DarkLight1337 DarkLight1337 requested a review from ywang96 October 12, 2024 00:35
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@jeejeelee
Copy link
Collaborator

Okay, I'd be happy to try supporting LoRA.

@DarkLight1337 DarkLight1337 marked this pull request as ready for review October 12, 2024 07:13
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 12, 2024
@@ -461,3 +463,50 @@ def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
if self.config.tie_word_embeddings else None),
)
loader.load_weights(weights)


class Gemma2EmbeddingModel(nn.Module, SupportsPP):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm preemptively moving them into the same file to be consistent with the upcoming BERT PR. (#9056)

@wenhuchen
Copy link

Nice work! There were some typos in our paper. We actually used last token instead eos token representation. I saw that you already used the last token as the representation, which is the correct implementation.

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Especially the implementation correctness is checked by model vendors :)

@DarkLight1337 DarkLight1337 merged commit 7abba39 into main Oct 16, 2024
56 checks passed
@DarkLight1337 DarkLight1337 deleted the vlm2vec branch October 16, 2024 06:31
@Isotr0py
Copy link
Collaborator

Ooops, seems that the new added vision embedding test didn't include in test-pipeline:

- label: Other Models Test # 6min
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
- tests/models/embedding/language
- tests/models/encoder_decoder/language
- tests/models/encoder_decoder/vision_language
commands:
- pytest -v -s models/embedding/language
- pytest -v -s models/encoder_decoder/language
- pytest -v -s models/encoder_decoder/vision_language

We might need to open another PR to include it.

@DarkLight1337
Copy link
Member Author

Ooops, seems that the new added vision embedding test didn't include in test-pipeline:

- label: Other Models Test # 6min
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
- tests/models/embedding/language
- tests/models/encoder_decoder/language
- tests/models/encoder_decoder/vision_language
commands:
- pytest -v -s models/embedding/language
- pytest -v -s models/encoder_decoder/language
- pytest -v -s models/encoder_decoder/vision_language

We might need to open another PR to include it.

Nice catch, I have opened #9406

@jvlinsta
Copy link

What will it take / when would it be expected for online engine support for this? ^^

@DarkLight1337
Copy link
Member Author

What will it take / when would it be expected for online engine support for this? ^^

Maybe sometime in the next two weeks.

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024
FerdinandZhong pushed a commit to FerdinandZhong/vllm that referenced this pull request Oct 29, 2024
@DarkLight1337
Copy link
Member Author

What will it take / when would it be expected for online engine support for this? ^^

Maybe sometime in the next two weeks.

Quick heads-up that it's done now!

@wenhuchen
Copy link

When will the LoRA version be online? Actually, the LoRA version works better.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Nov 1, 2024

When will the LoRA version be online? Actually, the LoRA version works better.

Currently, vLLM only supports LoRA for the language backbone of VLMs - some re-arch work is necessary to extend this to the vision encoder. @jeejeelee do you have a timeframe regarding this?

@wenhuchen
Copy link

When will the LoRA version be online? Actually, the LoRA version works better.

Currently, vLLM only supports LoRA for the language backbone of VLMs - some re-arch work is necessary to extend this to the vision encoder. @jeejeelee do you have a timeframe regarding this?

I guess you meant "vLLM doesn't support LoRA" instead of "vLLM only support LoRA"?

@DarkLight1337
Copy link
Member Author

I guess you meant "vLLM doesn't support LoRA" instead of "vLLM only support LoRA"?

I mean that vLLM supports LoRA on the language backbone, but not LoRA on the vision encoder of VLMs.

@wenhuchen
Copy link

Got it. Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants