Feat: Add jasper #1591

Samoed · 2024-12-14T16:19:25Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Full model results embeddings-benchmark/results#68

Some results are not matching, with a significant gap in AskUbuntuDupQuestions. @DunZhang, could you please take a look at what might be wrong?

I tried adding:

sentences = [i if i.strip() else "<|endoftext|>" for i in sentences]

but it had no effect. I also tried setting max_seq_length to 400, but that didn’t help either.

task	Results from embeddings-benchmark/results#68	Results from this implementation
BIOSSES	0.846598	0.847182
STS17 (en-ar)	0.52721	0.526445
STS17 (fr-en)	0.841421	0.841915
STS17 (en-en)	0.910079	0.910827
STS17 (nl-en)	0.841977	0.841734
STS17 (es-en)	0.869965	0.870213
STS17 (it-en)	0.863103	0.863171
STS17 (en-de)	0.858372	0.858703
STS17 (en-tr)	0.555649	0.555269
AskUbuntuDupQuestions	0.673812	0.67403
SummEval	0.314212	0.314331
SciFact	0.80372	0.80493
TweetSentimentExtractionClassification	0.772411	0.772722
EmotionClassification	0.8773	0.8772
SprintDuplicateQuestions	0.964021	0.963987
SCIDOCS	0.24638	0.24713

Full results jasper_results.zip

Eval code

import mteb

tasks = mteb.get_tasks(
    tasks=[
        "BIOSSES",
        "STS17",
        "STS16",
        "AskUbuntuDupQuestions",
        "SummEval",
        "SciFact",
        "SCIDOCS",
        "TweetSentimentExtractionClassification",
        "EmotionClassification",
        "SprintDuplicateQuestions"
    ],
    languages=["eng"]
)

models = [ 
    "infgrad/jasper_en_vision_language_v1",
]

evaluation = mteb.MTEB(tasks=tasks)

for model_name in models:
    model = mteb.get_model(model_name)
    evaluation.run(
        model,
        output_folder="results",
        verbosity=2,
        raise_error=False,
        encode_kwargs={"batch_size": 8},
        # overwrite_results=True,
    )

Samoed · 2024-12-14T22:01:05Z

I think my implementation of the model has lower results on AskUbuntuDupQuestions because, in the authors' implementation, prompts for passages are only applied to retrieval tasks. In my implementation, prompts for passages are not applied to any tasks (including retrieval and reranking), resulting in worse performance. I'm not sure what to do in this case

KennethEnevoldsen · 2024-12-14T23:35:00Z

I think it is perfectly fine to apply prompts only to some tasks (as long as it is clear in the implementation)

Samoed · 2024-12-14T23:38:46Z

I agree, but it unclear why passage prompt not applying to retrieval only. Should they be applied to InstructionsRetrieval or InstructionReranking?

KennethEnevoldsen · 2024-12-14T23:48:02Z

Yea that is a somewhat arbitrary decision (again that is why it is nice to have the implementation). I would probably add it in both cases

Samoed · 2024-12-15T10:51:21Z

I suggest waiting for @DunZhang's input to hear his opinion on this

DunZhang · 2024-12-17T01:59:54Z

@Samoed
Hi, It's an interesting thing 😄.

In Englinsh-MTEB, the Rerank tasks really more like STS tasks, which means that the queries and 'passages' are symmetrical.
In other words, the so-called 'passages' are actually questions.

Below is the task type about AskUbuntuDupQuestions and example data:

As the data are symmetrical, they all need prompt just like STS task!

On the contrary, Rerank task in Chinese-MTEB is about query and passage (Irrelevance to the present matter, not to be discussed at length).

Finally, my model's usage:

For s2p task (e.g. retrieval), s need prompt, p does not need prompt

For s2s task (e.g. STS), they all need prompt.

Reference：
https://huggingface.co/datasets/mteb/askubuntudupquestions-reranking?row=4
https://github.com/embeddings-benchmark/mteb/blob/main/docs/tasks.md

DunZhang · 2024-12-17T02:30:05Z

As for the other mismatched tasks, that's too hard to explain, and if the overall difference in averages isn't too large, I think it's negligible.

Below is some reproduction details:

In my test, the model is bfloat16
max_length=400
attn_implementation=sdpa
vector_dim=12288
padding_side=right

Finally:
do normalize in SentenceTransformers's encode function:

encode_multi_process(...., normalize_embeddings=True)

or

encode(...., normalize_embeddings=True)

then convert to fp32:
vectors = vectors.astype(dtype=np.float32)

Actually, convert to fp32 then do normalize always get high score(Statistically non-significant difference)

Samoed · 2024-12-17T06:39:33Z

I think I will try to apply prompt based on retrieval type. Thank you for the feedback!

# Conflicts: # mteb/models/overview.py

Samoed · 2024-12-17T22:35:23Z

mteb/models/jasper_models.py

+        instruction = self.get_task_instruction(task_name, prompt_type)
+
+        # to passage prompts won't be applied to passages
+        if prompt_type == PromptType.passage and task.metadata.type == "s2p":


I've updated it to apply the passage prompt only if the task type is s2s or p2p.

KennethEnevoldsen · 2024-12-22T21:11:48Z

Just noting here, that it is perfectly valid to change the prompt conditional on the task.

E.g.

if prompt_type == PromptType.passage and task.metadata.name not in SYMETRIC_RETRIEVAL_TASKS:

Samoed · 2024-12-22T22:04:49Z

Yes, I've changed and now prompt won't apply to all s2p tasks or I can strictly filter based on selected tasks.

KennethEnevoldsen · 2024-12-22T22:20:45Z

Then I believe this is ready to merge?

Samoed · 2024-12-22T22:37:38Z

I think yes, but I was waiting if @DunZhang have something to add

DunZhang · 2024-12-23T01:32:32Z

I think yes, but I was waiting if @DunZhang have something to add

Hi. I have nothing more to add 😄

KennethEnevoldsen · 2024-12-23T05:59:34Z

Perfect - will merge then - thanks for taking the time

Samoed added 16 commits December 14, 2024 12:45

init jasper

ebe87d5

init jasper

0a70486

add to overview

337ceca

add to overview

016ee5b

remove some params

f54c281

fix max length

045b2d6

return sdpa

22a958e

add dtype

ad94219

add dtype

26a2943

fix convert_to_tensor

9ba4553

change to encode

bba7024

return whitespace processing

7cabbeb

explicitly add instructions

5db50a5

move seq length

a52cf40

try float

1d7b250

fix max_seq_length

d54ba1b

Samoed marked this pull request as ready for review December 14, 2024 21:15

add prompt validation to format instruction

a2575e6

Samoed added 2 commits December 18, 2024 01:26

don't use instructions only to s2p

011b791

Merge branch 'refs/heads/main' into add_jasper

8dee219

# Conflicts: # mteb/models/overview.py

Samoed commented Dec 17, 2024

View reviewed changes

Merge branch 'main' into add_jasper

f8ccd76

KennethEnevoldsen merged commit ef5a068 into main Dec 23, 2024
10 checks passed

KennethEnevoldsen deleted the add_jasper branch December 23, 2024 05:59

Samoed mentioned this pull request Dec 23, 2024

add jasper results embeddings-benchmark/results#68

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add jasper #1591

Feat: Add jasper #1591

Samoed commented Dec 14, 2024 •

edited

Loading

Samoed commented Dec 14, 2024

KennethEnevoldsen commented Dec 14, 2024

Samoed commented Dec 14, 2024

KennethEnevoldsen commented Dec 14, 2024

Samoed commented Dec 15, 2024

DunZhang commented Dec 17, 2024

DunZhang commented Dec 17, 2024

Samoed commented Dec 17, 2024 •

edited

Loading

Samoed Dec 17, 2024

KennethEnevoldsen commented Dec 22, 2024 •

edited

Loading

Samoed commented Dec 22, 2024

KennethEnevoldsen commented Dec 22, 2024

Samoed commented Dec 22, 2024 •

edited

Loading

DunZhang commented Dec 23, 2024

KennethEnevoldsen commented Dec 23, 2024

Feat: Add jasper #1591

Feat: Add jasper #1591

Conversation

Samoed commented Dec 14, 2024 • edited Loading

Checklist

Adding a model checklist

Samoed commented Dec 14, 2024

KennethEnevoldsen commented Dec 14, 2024

Samoed commented Dec 14, 2024

KennethEnevoldsen commented Dec 14, 2024

Samoed commented Dec 15, 2024

DunZhang commented Dec 17, 2024

DunZhang commented Dec 17, 2024

Samoed commented Dec 17, 2024 • edited Loading

Samoed Dec 17, 2024

Choose a reason for hiding this comment

KennethEnevoldsen commented Dec 22, 2024 • edited Loading

Samoed commented Dec 22, 2024

KennethEnevoldsen commented Dec 22, 2024

Samoed commented Dec 22, 2024 • edited Loading

DunZhang commented Dec 23, 2024

KennethEnevoldsen commented Dec 23, 2024

Samoed commented Dec 14, 2024 •

edited

Loading

Samoed commented Dec 17, 2024 •

edited

Loading

KennethEnevoldsen commented Dec 22, 2024 •

edited

Loading

Samoed commented Dec 22, 2024 •

edited

Loading