Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MIEB] Fix get_fused_emebddings #1612

Merged
merged 3 commits into from
Dec 22, 2024

Conversation

izhx
Copy link
Contributor

@izhx izhx commented Dec 18, 2024

In PR #1583 , I only changed the arguments of get_fused_emebddings, but forgot to update the internal implementation logic accordingly, which caused runtime errors. (Sorry)

Now, they are fixed.

I will run i, t, and it encodings for each model to check.

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Checking progress:

code i2t - Flickr30kI2TRetrieval it2t - LLaVAIT2TRetrieval PR #1611 note
align_models.py
blip2_models.py salesforce-lavis is conflict with sentence_transformers>=3.0
blip_models.py
clip_models.py
cohere_v.py I don't have a key
dino_models.py DINO models only support image encoding.
e5_v.py
evaclip_models.py Code error when init model
jina_clip.py
nomic_models_vision.py
openclip_models.py
siglip_models.py
vista_models.py Code error when init model
vlm2vec_models.py
voyage_v I don't have a key

@izhx izhx marked this pull request as draft December 18, 2024 13:38
@izhx izhx marked this pull request as ready for review December 22, 2024 07:56
@gowitheflow-1998
Copy link
Contributor

no worries thanks a lot!

@gowitheflow-1998 gowitheflow-1998 merged commit 6740207 into embeddings-benchmark:mieb Dec 22, 2024
10 checks passed
@izhx
Copy link
Contributor Author

izhx commented Dec 26, 2024

@gowitheflow-1998 Hi, I fixed vista model loading and a runtime error related to max_length (the code did not truncate to max input length).

May I submit a PR with these fixes?

Commit: izhx@e849843


Error massage:

[WARNING|tokenization_utils_base.py:3928] 2024-12-26 19:53:40,650 >> Token indices sequence length is longer than the specified maximum sequence length for this model (537 > 512). Running this sequence through the model will result in indexing errors

ERROR:mteb.evaluation.MTEB:Error while evaluating OKVQAIT2TRetrieval: The size of tensor a (537) must match the size of tensor b (512) at non-singleton dimension 1

@gowitheflow-1998
Copy link
Contributor

@gowitheflow-1998 Hi, I fixed vista model loading and a runtime error related to max_length (the code did not truncate to max input length).

May I submit a PR with these fixes?

Yeah, will be great to have your fix merged to MIEB! Also I think ours was from the earliest version when the import wasn't moved to visual_bge from FlagEmbedding. Will be nice to have the latest!

@izhx izhx mentioned this pull request Dec 29, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants