-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abnormally low values for NanoBEIR benchmark #1627
Comments
@minsik-ai could you please specify:
Thanks in advance! |
The original blog only presents results for |
I've evaluated
Not matching results:
Matching results:
|
@Samoed 's findings is the main difference I've seen! |
Hmm the diffrerence here seems so stark that I will label it as a bug. It might be worth excluding it from the registrered benchmark until we have these results (comment it out in benhcmarks.py) |
Hi there! we evaluated E5-Mistral with the same prompts as they used in the original paper (using the SentenceTransformers implementation, IIRC). Just a quick reminder, scores between the Nano and full versions of BeIR are not supposed to match, even in magnitude. It is supposed to be used as a quick (and approximate) way to measure relative performance (i.e., if model A performs better than model B in NanoBeIR, it should also perform better in the full dataset) I've been postponing a full benchmark evaluation across multiple models for a while now, as I'm busy with other work-related stuff. But I'm hoping i can get some time over the holidays to do it properly soon. |
Correct me if I'm wrong, but I think 2 NanoBEIR implementations (one from SentenceTransformers, one from MTEB) are compared here! We are not comparing NanoBEIR to full BEIR. You can check the code to verify it. |
Do we have any traction on this issue? I plan to have a look at SentenceTransformers code and spot any differences, meanwhile would appreciate any insights on why SentenceTransformers and MTEB differ. |
For I don't have any idea why the results are different, but I'll try to find issues too |
|
Continuing from #1588
NanoBEIR performance on Touche2020 and NFCorpus is too low compared to reported values.
You can check out some of the values here: embeddings-benchmark/results#72
The text was updated successfully, but these errors were encountered: