chore(deps): update dependency sentence_transformers to v3 #1291

renovate · 2024-06-11T07:10:42Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
sentence_transformers	`==2.7.0` -> `==3.3.1`

Release Notes

UKPLab/sentence-transformers (sentence_transformers)

`v3.3.1`: - Patch private model loading without environment variable

Compare Source

This patch release fixes a small issue with loading private models from Hugging Face using the token argument.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==3.3.1
### Inference only, use one of:
pip install sentence-transformers==3.3.1
pip install sentence-transformers[onnx-gpu]==3.3.1
pip install sentence-transformers[onnx]==3.3.1
pip install sentence-transformers[openvino]==3.3.1

Details

If you're loading model under this scenario:

Your model is hosted on Hugging Face.
Your model is private.
You haven't set the HF_TOKEN environment variable via huggingface-cli login or some other approach.
You're passing the token argument to SentenceTransformer to load the model.

Then you may have encountered a crash in v3.3.0. This should be resolved now.

All Changes

[docs] Fix the prompt link to the training script by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3060
[Fix] Resolve loading private Transformer model in version 3.3.0 by @pesuchin in https://github.com/UKPLab/sentence-transformers/pull/3058

Full Changelog: UKPLab/sentence-transformers@v3.3.0...v3.3.1

`v3.3.0`: - Massive CPU speedup with OpenVINO int8 quantization; Training with Prompts for stronger models; NanoBEIR IR evaluation; PEFT compatibility; Transformers v4.46.0 compatibility

Compare Source

4x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free performance boost, convenient evaluation on NanoBEIR: a subset of a strong Information Retrieval benchmark, PEFT compatibility by easily adding/loading adapters, Transformers v4.46.0 compatibility, and Python 3.8 deprecation.

Install this version with:

##### Training + Inference
pip install sentence-transformers[train]==3.3.0

##### Inference only, use one of:
pip install sentence-transformers==3.3.0
pip install sentence-transformers[onnx-gpu]==3.3.0
pip install sentence-transformers[onnx]==3.3.0
pip install sentence-transformers[openvino]==3.3.0

OpenVINO int8 static quantization (https://github.com/UKPLab/sentence-transformers/pull/3025)

We introduce int8 static quantization using OpenVINO, a highly performant solution that outperforms all other current backends by a mile, at a minimal loss in performance. Here are the updated benchmarks:

Quantizing directly to the Hugging Face Hub

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model

##### 1. Load a model with the OpenVINO backend
model = SentenceTransformer("all-MiniLM-L6-v2", backend="openvino")

##### 2. Quantize the model to int8, push the model to https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
##### as a pull request:
export_static_quantized_openvino_model(
    model,
    quantization_config=None,
    model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
    push_to_hub=True,
    create_pr=True,
)

You can immediately use the model, even before it's merged, by using the revision argument:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
    revision=f"refs/pr/{pull_request_nr}"
)

And once it's merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino/openvino_model_qint8_quantized.xml"},
)

Quantizing locally

You can also quantize a model and save it locally:

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model
from optimum.intel import OVQuantizationConfig

model = SentenceTransformer("all-mpnet-base-v2", backend="openvino")
model.save_pretrained("path/to/all-mpnet-base-v2-local")
quantization_config = OVQuantizationConfig() # <- You can update settings here
export_static_quantized_openvino_model(model, quantization_config, "path/to/all-mpnet-base-v2-local")

And after quantizing, you can load it like so:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "path/to/all-mpnet-base-v2-local",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
)

All original Sentence Transformer models already have these new openvino_model_qint8_quantized.xml files, so you can load them without exporting directly! I would recommend making pull requests for other models on Hugging Face that you'd like to see quantized.

Learn more about how to Speed up Inference in the documentation: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

Training with Prompts (https://github.com/UKPLab/sentence-transformers/pull/2964)

Many modern embedding models are trained with “instructions” or “prompts” following the INSTRUCTOR paper. These prompts are strings, prefixed to each text to be embedded, allowing the model to distinguish between different types of text.

For example, the mixedbread-ai/mxbai-embed-large-v1 model was trained with Represent this sentence for searching relevant passages: as the prompt for all queries. This prompt is stored in the model configuration under the prompt name "query", so users can specify that prompt_name in model.encode:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
query_embedding = model.encode("What are Pandas?", prompt_name="query")

##### or
##### query_embedding = model.encode("What are Pandas?", prompt="Represent this sentence for searching relevant passages: ")
document_embeddings = model.encode([
    "Pandas is a software library written for the Python programming language for data manipulation and analysis.",
    "Pandas are a species of bear native to South Central China. They are also known as the giant panda or simply panda.",
    "Koala bears are not actually bears, they are marsupials native to Australia.",
])
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)

##### => tensor([[0.7594, 0.7560, 0.4674]])

Various papers (INSTRUCTOR, BGE) show that including prompts or instructions both during training and inference results in stronger performance. As of this release, it's now possible to easily train with prompts in Sentence Transformers with just one extra training argument: prompts. There are 4 accepted formats for it:

str: A single prompt to use for all columns in all datasets. For example:

args = SentenceTransformerTrainingArguments(
    ...,
    prompts="text: ",
    ...,
)

Dict[str, str]: A dictionary mapping column names to prompts, applied to all datasets. For example:

args = SentenceTransformerTrainingArguments(
    ...,
    prompts={
        "query": "query: ",
        "answer": "document: ",
    },
    ...,
)

Dict[str, str]: A dictionary mapping dataset names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:

args = SentenceTransformerTrainingArguments(
    ...,
    prompts={
        "stsb": "Represent this text for semantic similarity search: ",
        "nq": "Represent this text for retrieval: ",
    },
    ...,
)

Dict[str, Dict[str, str]]: A dictionary mapping dataset names to dictionaries mapping column names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:

args = SentenceTransformerTrainingArguments(
    ...,
    prompts={
        "stsb": {
            "sentence1": "sts: ",
            "sentence2": "sts: ",
        },
        "nq": {
            "query": "query: ",
            "document": "document: ",
        },
    },
    ...,
)

I've trained models with and without prompts for 2 base models: mpnet-base and bert-base-uncased:

For both base models, the model with prompts consistently outperformed the baseline model. After training, the models with prompts resulted in a 0.66% and 0.90% relative improvement on NDCG@10 at no extra cost.

`mpnet-base` tests	`bert-base-uncased` tests

Training with Prompts documentation: https://sbert.net/examples/training/prompts/README.html
Training with Prompts training script: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/prompts/training_nq_prompts.py

NanoBEIR Evaluator integration (https://github.com/UKPLab/sentence-transformers/pull/2966)

This update introduced a new simple NanoBEIREvaluator, evaluating your model against NanoBEIR: a collection of subsets of the 13 BEIR datasets. BEIR corresponds to the retrieval tab of MTEB, and is commonly seen as a valuable indicator of general-purpose information retrieval performance.

With the NanoBEIREvaluator, you can easily evaluate your models on a much faster benchmark that should give similar insights in performance as BEIR. You can use it like so:

from sentence_transformers.evaluation import NanoBEIREvaluator
from sentence_transformers import SentenceTransformer
import logging

##### Optional, but nice to get human-readable results in the terminal
logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

##### 1. Load a model
model = SentenceTransformer("all-mpnet-base-v2", backend="onnx")

##### 2. Initialize the evaluator
evaluator = NanoBEIREvaluator()

##### 3. Call the evaluator to get a dictionary of metric names to values
results = evaluator(model)
"""
NanoBEIR Evaluation of the model on ['climatefever', 'dbpedia', 'fever', 'fiqa2018', 'hotpotqa', 'msmarco', 'nfcorpus', 'nq', 'quoraretrieval', 'scidocs', 'arguana', 'scifact', 'touche2020'] dataset:
Evaluating NanoClimateFEVER
Information Retrieval Evaluation of the model on the NanoClimateFEVER dataset:
Queries: 50
Corpus: 3408

Score-Function: cosine
Accuracy@1: 24.00%
Accuracy@3: 36.00%
Accuracy@5: 44.00%
Accuracy@10: 66.00%
Precision@1: 24.00%
Precision@3: 14.00%
Precision@5: 10.40%
Precision@10: 9.00%
Recall@1: 9.50%
Recall@3: 17.33%
Recall@5: 22.90%
Recall@10: 36.07%
MRR@10: 0.3311
NDCG@10: 0.2618
MAP@100: 0.1982
Evaluating NanoDBPedia
Information Retrieval Evaluation of the model on the NanoDBPedia dataset:
Queries: 50
Corpus: 6045

Score-Function: cosine
Accuracy@1: 66.00%
Accuracy@3: 88.00%
Accuracy@5: 88.00%
Accuracy@10: 88.00%
Precision@1: 66.00%
Precision@3: 58.00%
Precision@5: 52.00%
Precision@10: 43.60%
Recall@1: 6.87%
Recall@3: 14.70%
Recall@5: 20.30%
Recall@10: 27.62%
MRR@10: 0.7533
NDCG@10: 0.5384
MAP@100: 0.3796
Evaluating NanoFEVER
Information Retrieval Evaluation of the model on the NanoFEVER dataset:
Queries: 50
Corpus: 4996

... (truncated for brevity)

Aggregated for Score Function: cosine
Accuracy@1: 52.87%
Accuracy@3: 71.35%
Accuracy@5: 78.45%
Accuracy@10: 85.07%
Precision@1: 52.87%
Recall@1: 30.28%
Precision@3: 33.78%
Recall@3: 47.93%
Precision@5: 26.23%
Recall@5: 55.04%
Precision@10: 18.07%
Recall@10: 62.54%
MRR@10: 0.6334
NDCG@10: 0.5758
"""

##### 4. Print the results
print(evaluator.primary_metric)

##### => "NanoBEIR_mean_cosine_ndcg@10"
print(results[evaluator.primary_metric])

##### => 0.5758124378869705

Advanced Usage

You can also specify a subset of datasets, and you can specify query and/or corpus prompts, if your model uses them. For example:

import logging
from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

##### Optional, but nice to get human-readable results in the terminal
logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

model = SentenceTransformer('intfloat/multilingual-e5-large-instruct')

datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
    "QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\\nQuery: ",
    "MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\\nQuery: "
}

evaluator = NanoBEIREvaluator(
    dataset_names=datasets,
    query_prompts=query_prompts,
)

results = evaluator(model)
'''
NanoBEIR Evaluation of the model on ['QuoraRetrieval', 'MSMARCO'] dataset:
Evaluating NanoQuoraRetrieval
Information Retrieval Evaluation of the model on the NanoQuoraRetrieval dataset:
Queries: 50
Corpus: 5046

Score-Function: cosine
Accuracy@1: 92.00%
Accuracy@3: 98.00%
Accuracy@5: 100.00%
Accuracy@10: 100.00%
Precision@1: 92.00%
Precision@3: 40.67%
Precision@5: 26.00%
Precision@10: 14.00%
Recall@1: 81.73%
Recall@3: 94.20%
Recall@5: 97.93%
Recall@10: 100.00%
MRR@10: 0.9540
NDCG@10: 0.9597
MAP@100: 0.9395

Evaluating NanoMSMARCO
Information Retrieval Evaluation of the model on the NanoMSMARCO dataset:
Queries: 50
Corpus: 5043

Score-Function: cosine
Accuracy@1: 40.00%
Accuracy@3: 74.00%
Accuracy@5: 78.00%
Accuracy@10: 88.00%
Precision@1: 40.00%
Precision@3: 24.67%
Precision@5: 15.60%
Precision@10: 8.80%
Recall@1: 40.00%
Recall@3: 74.00%
Recall@5: 78.00%
Recall@10: 88.00%
MRR@10: 0.5849
NDCG@10: 0.6572
MAP@100: 0.5892
Average Queries: 50.0
Average Corpus: 5044.5

Aggregated for Score Function: cosine
Accuracy@1: 66.00%
Accuracy@3: 86.00%
Accuracy@5: 89.00%
Accuracy@10: 94.00%
Precision@1: 66.00%
Recall@1: 60.87%
Precision@3: 32.67%
Recall@3: 84.10%
Precision@5: 20.80%
Recall@5: 87.97%
Precision@10: 11.40%
Recall@10: 94.00%
MRR@10: 0.7694
NDCG@10: 0.8085
'''
print(evaluator.primary_metric)

##### => "NanoBEIR_mean_cosine_ndcg@10"
print(results[evaluator.primary_metric])

##### => 0.8084508771660436

API Reference: NanoBEIREvaluator

PEFT compatibility (https://github.com/UKPLab/sentence-transformers/pull/3000, https://github.com/UKPLab/sentence-transformers/pull/2980, https://github.com/UKPLab/sentence-transformers/pull/3046)

Sentence Transformers has been integrated much more closely with PEFT. Notably, we introduce new methods:

These methods allow you to add new PEFT adapters or load pretrained ones, for example:

Adding a adapter

from sentence_transformers import SentenceTransformer

##### 1. Load a model to finetune with 2. (Optional) model card data
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    model_card_data=SentenceTransformerModelCardData(
        language="en",
        license="apache-2.0",
        model_name="all-MiniLM-L6-v2 adapter finetuned on GooAQ pairs",
    ),
)

##### 2. Create a LoRA adapter for the model & add it
peft_config = LoraConfig(
    task_type=TaskType.FEATURE_EXTRACTION,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)
model.add_adapter(peft_config)

##### Proceed as usual... See https://sbert.net/docs/sentence_transformer/training_overview.html

Loading a pretrained adapter

Given sentence-transformers-testing/stsb-bert-tiny-lora as a small adapter model (the adapter_model.safetensors file is only 33.8kB!) on top of sentence-transformers-testing/stsb-bert-tiny-safetensors, you can either load this adapter directly:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)

##### (2, 128)

Or you can load the original model and load the adapter into it:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-safetensors")
model.load_adapter("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)

##### (2, 128)

Transformers v4.46.0 compatibility (https://github.com/UKPLab/sentence-transformers/pull/3026, https://github.com/UKPLab/sentence-transformers/pull/3035, https://github.com/UKPLab/sentence-transformers/pull/3037, https://github.com/UKPLab/sentence-transformers/pull/3038)

The recent transformers v4.46.0 update introduced a few changes that were incompatible with Sentence Transformers. For example:

Use "processing_class" argument instead of "tokenizers"
Add a num_items_in_batch argument to the compute_loss method in the Trainer
Adding a ValueError if eval_dataset is None while eval_strategy is not "no" (this should be possible in Sentence Transformers, as we accept evaluating with just an evaluator as well)

These issues and deprecation warnings have been resolved.

Drop Python 3.8 support (https://github.com/UKPLab/sentence-transformers/pull/3033)

Given that Python 3.8 has now reached it's end of life, Sentence Transformers will no longer support it.

All Changes

[peft] If AutoModel is wrapped with PEFT for prompt learning, then extend the attention mask by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3000
[integration] Add support for Transformers v4.46.0 by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3026
add an ImportError to tell the user that datasets must be install to fit a model by @h4c5 in https://github.com/UKPLab/sentence-transformers/pull/3020
[feat] Integrate NanoBeIR datasets; use model.similarity by default in evaluators by @ArthurCamara in https://github.com/UKPLab/sentence-transformers/pull/2966
Fix model name typo in example by @programmer-ke in https://github.com/UKPLab/sentence-transformers/pull/3028
Support OpenVINO int8 static quantization by @l-bat in https://github.com/UKPLab/sentence-transformers/pull/3025
[fix] Avoid passing eval_dataset=None to transformers due to >=v4.46.0 crash by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3035
[docs] Update the dated example in the NanoBEIREvaluator by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3034
[deprecate] Drop Python 3.8 support due to EOL by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3033
[tests] Remove evaluation_steps from model.fit test without evaluator by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3037
[fix] Fix loading pre-exported OV/ONNX model if export=False by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3036
[chore] If Transformers 4.46.0, use processing_class instead of tokenizer when saving by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3038
[docs] Add some missing docs for include_prompt in Pooling by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3042
[feat] Trainer with prompts and prompt masking by @ArthurCamara in https://github.com/UKPLab/sentence-transformers/pull/2964
[fix] Fix model loading inconsistency after Peft training by using PeftModel by @pesuchin in https://github.com/UKPLab/sentence-transformers/pull/2980
[enh] Add Support for multiple adapters on Transformers-based models by @carlesonielfa in https://github.com/UKPLab/sentence-transformers/pull/3046 & https://github.com/UKPLab/sentence-transformers/pull/2993
Moved Model Card Callback init in Trainer to a separate function by @tRosenflanz in https://github.com/UKPLab/sentence-transformers/pull/3047

New Contributors

@h4c5 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3020
@programmer-ke made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3028
@l-bat made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3025
@carlesonielfa made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3046
@tRosenflanz made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3047

Special Thanks

Big thanks to @ArthurCamara for leading the work on both 1) training with prompts and 2) NanoBEIR.

Full Changelog: UKPLab/sentence-transformers@v3.2.1...v3.3.0

`v3.2.1`: - Patch CLIP loading, small ONNX fix, compatibility with other libraries

Compare Source

This patch release fixes some small bugs, such as related to loading CLIP models, automatic model card generation issues, and ensuring compatibility with third party libraries.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==3.2.1

### Inference only, use one of:
pip install sentence-transformers==3.2.1
pip install sentence-transformers[onnx-gpu]==3.2.1
pip install sentence-transformers[onnx]==3.2.1
pip install sentence-transformers[openvino]==3.2.1

Fixing Loading non-Transformer models

In v3.2.0, a non-Transformer based model (e.g. CLIP) would not load correctly if the model was saved in the root of the model repository/directory. This has been resolved in #3007.

Throw error if `StaticEmbedding`-based model is finetuned with incompatible losses

The following losses are not compatible with StaticEmbedding-based models:

CachedGISTEmbedLoss
CachedMultipleNegativesRankingLoss
CachedMultipleNegativesSymmetricRankingLoss
DenoisingAutoEncoderLoss
GISTEmbedLoss

An error is now thrown when one of these are used with a StaticEmbedding-based model. I recommend using MultipleNegativesRankingLoss to finetune these models, e.g. as in https://huggingface.co/tomaarsen/static-bert-uncased-gooaq.
Note: to get good performance, you must use much higher learning rates than otherwise. In my experiments, 2e-1 worked well.

Patch ONNX model when the model uses `output_hidden_states`

For example, this script used to fail, but passes now:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "distiluse-base-multilingual-cased",
    backend="onnx",
    model_kwargs={"provider": "CPUExecutionProvider"},
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape)

All changes

Bump optimum version by @echarlaix in https://github.com/UKPLab/sentence-transformers/pull/2984
[docs] Update the training snippets for some losses that should use the v3 Trainer by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2987
[enh] Throw error if StaticEmbedding-based model is trained with incompatible loss by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2990
[fix] Fix semantic_search_usearch with 'binary' by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2989
[enh] Add support for large_string in model card create by @yaohwang in https://github.com/UKPLab/sentence-transformers/pull/2999
[model cards] Prevent crash on generating widgets if dataset column is empty by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2997
[fix] Added model2vec import compatible with current and newer version by @Pringled in https://github.com/UKPLab/sentence-transformers/pull/2992
Fix cache_dir issue with loading CLIPModel by @BoPeng in https://github.com/UKPLab/sentence-transformers/pull/3007
[warn] Throw a warning if compute_metrics is set, as it's not used by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3002
[fix] Prevent IndexError if output_hidden_states & ONNX by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3008

New Contributors

@echarlaix made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2984
@yaohwang made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2999
@Pringled made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2992
@BoPeng made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3007

Full Changelog: UKPLab/sentence-transformers@v3.2.0...v3.2.1

`v3.2.0`: - ONNX and OpenVINO backends offering 2-3x speedup; Static Embeddings offering 50x-500x speedups at ~10-20% performance cost

Compare Source

This release introduces 2 new efficient computing backends for SentenceTransformer models: ONNX and OpenVINO + optimization & quantization, allowing for speedups up to 2x-3x; static embeddings via Model2Vec allowing for lightning-fast models (i.e., 50x-500x speedups) at a ~10%-20% performance cost; and various small improvements and fixes.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==3.2.0

### Inference only, use one of:
pip install sentence-transformers==3.2.0
pip install sentence-transformers[onnx-gpu]==3.2.0
pip install sentence-transformers[onnx]==3.2.0
pip install sentence-transformers[openvino]==3.2.0

Faster ONNX and OpenVINO Backends for SentenceTransformer (#2712)

Introducing a new backend keyword argument to the SentenceTransformer initialization, allowing values of "torch" (default), "onnx", and "openvino".
These come with new installations:

pip install sentence-transformers[onnx-gpu]

### or ONNX for CPU only:
pip install sentence-transformers[onnx]

### or
pip install sentence-transformers[openvino]

It's as simple as:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

If you specify a backend and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to model.push_to_hub or model.save_pretrained into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via model_kwargs will be passed on to ORTModel.from_pretrained or OVBaseModel.from_pretrained. The most useful arguments are:

provider: (Only if backend="onnx") ONNX Runtime provider to use for loading the model, e.g. "CPUExecutionProvider" . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. "CUDAExecutionProvider") will be used.
file_name: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
export: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

For example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
	backend="onnx",
	model_kwargs={
		"file_name": "model_O3.onnx",
		"provider": "CPUExecutionProvider",
	}
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

Benchmarks

We ran benchmarks for CPU and GPU, averaging findings across 4 models of various sizes, 3 datasets, and numerous batch sizes. Here are the findings:

These findings resulted in these recommendations:

For GPU, you can expect 2x speedup with fp16 at no cost, and for CPU you can expect ~2.5x speedup at a cost of 0.4% accuracy.

ONNX Optimization and Quantization

In addition to exporting default ONNX and OpenVINO models, we also introduce 2 helper methods for optimizing and quantizing ONNX models:

Optimization

export_optimized_onnx_model: This function uses Optimum to implement several optimizations in the ONNX model, ranging from basic optimizations to approximations and mixed precision. Read about the 4 default options here. This function accepts:

model A SentenceTransformer model loaded with backend="onnx".
optimization_config: "O1", "O2", "O3", or "O4" from 🤗 Optimum or a custom OptimizationConfig instance.
model_name_or_path: The directory or model repository where the optimized model will be saved.
push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for optimizing models of repositories that you don't have write access to.
file_suffix: The suffix to add to the optimized model file name. Will use the optimization_config string or "optimized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_optimized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_optimized_onnx_model(
	model=onnx_model,
	optimization_config="O4",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
)

Quantization

export_dynamic_quantized_onnx_model: This function uses Optimum to quantize the ONNX model to int8, also allowing for hardware-specific optimizations. This results in impressive speedups for CPUs. In my findings, each of the default quantization configuration options gave approximately the same performance improvements. This function accepts

model A SentenceTransformer model loaded with backend="onnx".
quantization_config: "arm64", "avx2", "avx512", or "avx512_vnni" representing quantization configurations from AutoQuantizationConfig, or an QuantizationConfig instance.
model_name_or_path: The directory or model repository where the optimized model will be saved.
push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for quantizing models of repositories that you don't have write access to.
file_suffix: The suffix to add to the optimized model file name. Will use the quantization_config string or e.g. "int8_quantized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_quantized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_quantized_onnx_model(
	model=onnx_model,
	quantization_config="avx512",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
)

Lightning-Fast Static Embeddings via Model2Vec (#2961)

If ONNX or OpenVINO isn't fast enough for you yet, then perhaps you'll enjoy Static Embeddings. These embeddings are a bit akin to GLoVe or Word2vec, i.e. they're bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks.

However, these Static Embeddings are created in different ways. For example:

Distillation via the Model2Vec technique. This projects allows you to distill any Sentence Transformer model into Static Embeddings. For example, distilling BAAI/bge-base-en-v1.5 resulted in a Static Embeddings Sentence Transformer model that reaches 87.5% of the performance of all-MiniLM-L6-v2 on MTEB (+ PEARL & WordSim) and 97.4% of the performance of all-MiniLM-L6-v2 on various classification benchmarks.
You can initialize Static Embeddings via Model2Vec in two ways:
- from_model2vec: You can load one of the pretrained Model2Vec models:

note: `pip install model2vec` is needed, but not for inference

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model

static_embedding = StaticEmbedding.from_model2vec("minishlab/M2V_multilingual_output")
model = SentenceTransformer(modules=[static_embedding])

Encode some texts

queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities

scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8170, 0.3843],
        [0.3929, 0.5818]])
"""
```
* [`from_distillation`](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.StaticEmbedding.from_distillation): You can use the name of any Sentence Transformer model alongside some parameters (See [this docs](https://redirect.github.com/MinishLab/model2vec#distilling-a-model2vec-model) for more information) to perform the distillation yourself, without needing any dataset. On my device, this takes ~4s on a GPU and ~2 minutes on a CPU:
```python

note: `pip install model2vec` is needed, but not for inference

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding by distilling via model2vec

static_embedding = StaticEmbedding.from_distillation(
    "mixedbread-ai/mxbai-embed-large-v1",
    device="cuda",
    pca_dims=256,
    apply_zipf=True,
)
model = SentenceTransformer(modules=[static_embedding])

Encode some texts

queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities

scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8430, 0.3271],
        [0.3213, 0.5861]])
"""
```

Random initialization: Although this initialization needs finetuning, finetuning a Sentence Transformers model backed by StaticEmbedding is extremely fast. For example, I was able to finetune tomaarsen/static-bert-uncased-gooaq with MatryoshkaLoss & MultipleNegativesRankingLoss on the entire (3 million pairs) gooaq dataset in just 7 minutes. This model reaches a NDCG@10 of 79.33 on a hold-out set of 10k samples from gooaq, whereas e.g. BAAI/bge-base-en-v1.5 reaches 85.01 NDCG@10. In short, only 6.6% less performance for a model that's about 500x faster.
That's not a typo: I can compute embeddings for about 14000 stsb sentences from per second on CPU, compared to about ~24 with BAAI/bge-base-en-v1.5, a.k.a. 625x faster.

[!NOTE]
You can save_pretrained and load these models like any other Sentence Transformer models, the StaticEmbedding initialization is only necessary when you're creating a new model.

Creation:

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

# Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model
static_embedding = StaticEmbedding.from_distillation(
    "mixedbread-ai/mxbai-embed-large-v1",
    device="cuda",
    pca_dims=256,
    apply_zipf=True,
)
model = SentenceTransformer(modules=[static_embedding])
model.save_pretrained("static-mxbai-embed-large-v1")
# or
# model.push_to_hub("tomaarsen/static-mxbai-embed-large-v1")

Inference:

from sentence_transformers import SentenceTransformer

# Initialize a Sentence Transformer model with a static embedding
model = SentenceTransformer("static-mxbai-embed-large-v1")

model.encode([...])

Small changes

The InformationRetrievalEvaluator now accepts query_prompt, query_prompt_name, corpus_prompt, and corpus_prompt_name arguments, useful if your model requires specific prompts for queries and/or documents for the best performance. (#2951)
The mine_hard_negatives function now accepts anchor_column_name and positive_column_name for specifying which dataset columns will be used. If not specified, the first two columns are used, respectively. Additionally, the min_score parameter is added, ensuring that all mined negatives have a similarity score of at least min_score according to the chosen SentenceTransformer or CrossEncoder model. (#2977)
If you're using multiple evaluators during training via SequentialEvaluator, e.g. multiple evaluators for different Matryoshka dimensions, then the order is now preserved in the training logs in the model card. Previously, they were sorted by name, resulting in weird orderings (e.g. "gooaq-1024", "gooaq-128", "gooaq-256", "gooaq-32", "gooaq-512", "gooaq-64") (#2963)
CachedGISTEmbedLoss has been improved to support multiple negatives per sample, i.e. the loss now accepts data in the (anchor, positive, negative_1, …, negative_n) format. It is the third loss to support this format (see docs):

All changes

[fix] Only save first module in root if "save_in_root" is specified. by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2957
[feat] Add query prompts to Information Retrieval Evaluator by @ArthurCamara in https://github.com/UKPLab/sentence-transformers/pull/2951
[model cards] Keep evaluation order in training logs if there's multiple evaluators by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2963
Add negatives in CachedGISTEmbedLoss by @daegonYu in https://github.com/UKPLab/sentence-transformers/pull/2946
[ENH] -- CrossEncoder.rank by @it176131 in https://github.com/UKPLab/sentence-transformers/pull/2947
[feat] Add lightning-fast StaticEmbedding module based on model2vec by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2961
[feat] Add ONNX and OpenVINO backends by @helena-intel and @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2712
Refine mine_hard_negatives arguments by @bakrianoo in https://github.com/UKPLab/sentence-transformers/pull/2977

New Contributors

@daegonYu made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2946
@it176131 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2947
@helena-intel made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2712
@bakrianoo made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2977

Special thanks to @echarlaix for making the new backends possible due to some last-minute changes in optimum and optimum-intel.

Full Changelog: UKPLab/sentence-transformers@v3.1.1...v3.2.0

`v3.1.1`: - Patch hard negative mining & remove `numpy<2` restriction

Compare Source

This patch release fixes hard negatives mining for models that don't automatically normalize their embeddings and it lifts the numpy<2 restriction that was previously required.

Install this version with

##### Full installation:
pip install sentence-transformers[train]==3.1.1

##### Inference only:
pip install sentence-transformers==3.1.1

Hard Negatives Mining Patch (#2944)

The mine_hard_negatives utility introduced in the previous release would fail if use_faiss=True & the model does not automatically normalize its embeddings. This release patches that, allowing the utility to work with all Sentence Transformer models:

from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

##### Load a Sentence Transformer model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1").bfloat16()

##### Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train[:10000]")
print(dataset)
"""
Dataset({
    features: ['query', 'answer'],
    num_rows: 10000
})
"""

##### Mine hard negatives
dataset = mine_hard_negatives(
    dataset=dataset,
    model=model,
    range_min=10,
    range_max=50,
    max_score=0.8,
    margin=0.1,
    num_negatives=5,
    sampling_strategy="random",
    batch_size=128,
    use_faiss=True,
)
'''
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:21<00:00,  3.51it/s]
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/LibrePhotos/librephotos).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE5LjAiLCJ0YXJnZXRCcmFuY2giOiJkZXYiLCJsYWJlbHMiOltdfQ==-->

sonarcloud · 2024-07-04T15:47:02Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

renovate bot force-pushed the renovate/sentence_transformers-3.x branch from 07368a3 to 3c92933 Compare June 16, 2024 11:35

renovate bot force-pushed the renovate/sentence_transformers-3.x branch from 3c92933 to 7868afe Compare July 4, 2024 15:46

renovate bot force-pushed the renovate/sentence_transformers-3.x branch from 7868afe to 7109494 Compare July 10, 2024 15:58

renovate bot force-pushed the renovate/sentence_transformers-3.x branch 2 times, most recently from fbfb287 to eaa847e Compare September 20, 2024 09:44

renovate bot force-pushed the renovate/sentence_transformers-3.x branch from eaa847e to 7e1a61c Compare October 10, 2024 18:37

renovate bot force-pushed the renovate/sentence_transformers-3.x branch 2 times, most recently from 834f49c to cbe02a5 Compare October 22, 2024 17:49

renovate bot force-pushed the renovate/sentence_transformers-3.x branch from cbe02a5 to e7c3596 Compare November 11, 2024 13:53

chore(deps): update dependency sentence_transformers to v3

a0b1406

renovate bot force-pushed the renovate/sentence_transformers-3.x branch from e7c3596 to a0b1406 Compare November 18, 2024 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): update dependency sentence_transformers to v3 #1291

chore(deps): update dependency sentence_transformers to v3 #1291

renovate bot commented Jun 11, 2024 •

edited

Loading

sonarcloud bot commented Jul 4, 2024

chore(deps): update dependency sentence_transformers to v3 #1291

Are you sure you want to change the base?

chore(deps): update dependency sentence_transformers to v3 #1291

Conversation

renovate bot commented Jun 11, 2024 • edited Loading

Release Notes

v3.3.1: - Patch private model loading without environment variable

Details

All Changes

v3.3.0: - Massive CPU speedup with OpenVINO int8 quantization; Training with Prompts for stronger models; NanoBEIR IR evaluation; PEFT compatibility; Transformers v4.46.0 compatibility

OpenVINO int8 static quantization (https://github.com/UKPLab/sentence-transformers/pull/3025)

Quantizing directly to the Hugging Face Hub

Quantizing locally

Training with Prompts (https://github.com/UKPLab/sentence-transformers/pull/2964)

NanoBEIR Evaluator integration (https://github.com/UKPLab/sentence-transformers/pull/2966)

PEFT compatibility (https://github.com/UKPLab/sentence-transformers/pull/3000, https://github.com/UKPLab/sentence-transformers/pull/2980, https://github.com/UKPLab/sentence-transformers/pull/3046)

Adding a adapter

Loading a pretrained adapter

Transformers v4.46.0 compatibility (https://github.com/UKPLab/sentence-transformers/pull/3026, https://github.com/UKPLab/sentence-transformers/pull/3035, https://github.com/UKPLab/sentence-transformers/pull/3037, https://github.com/UKPLab/sentence-transformers/pull/3038)

Drop Python 3.8 support (https://github.com/UKPLab/sentence-transformers/pull/3033)

All Changes

New Contributors

Special Thanks

v3.2.1: - Patch CLIP loading, small ONNX fix, compatibility with other libraries

Fixing Loading non-Transformer models

Throw error if StaticEmbedding-based model is finetuned with incompatible losses

Patch ONNX model when the model uses output_hidden_states

All changes

New Contributors

v3.2.0: - ONNX and OpenVINO backends offering 2-3x speedup; Static Embeddings offering 50x-500x speedups at ~10-20% performance cost

Faster ONNX and OpenVINO Backends for SentenceTransformer (#​2712)

Benchmarks

Optimization

Quantization

Lightning-Fast Static Embeddings via Model2Vec (#​2961)

note: pip install model2vec is needed, but not for inference

Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model

Encode some texts

Compute similarities

note: pip install model2vec is needed, but not for inference

Initialize a Sentence Transformer model with a static embedding by distilling via model2vec

Encode some texts

Compute similarities

Small changes

All changes

New Contributors

v3.1.1: - Patch hard negative mining & remove numpy&lt;2 restriction

Hard Negatives Mining Patch (#​2944)

sonarcloud bot commented Jul 4, 2024

Quality Gate passed

renovate bot commented Jun 11, 2024 •

edited

Loading

`v3.3.1`: - Patch private model loading without environment variable

`v3.3.0`: - Massive CPU speedup with OpenVINO int8 quantization; Training with Prompts for stronger models; NanoBEIR IR evaluation; PEFT compatibility; Transformers v4.46.0 compatibility

`v3.2.1`: - Patch CLIP loading, small ONNX fix, compatibility with other libraries

Throw error if `StaticEmbedding`-based model is finetuned with incompatible losses

Patch ONNX model when the model uses `output_hidden_states`

`v3.2.0`: - ONNX and OpenVINO backends offering 2-3x speedup; Static Embeddings offering 50x-500x speedups at ~10-20% performance cost

Faster ONNX and OpenVINO Backends for SentenceTransformer (#2712)

Lightning-Fast Static Embeddings via Model2Vec (#2961)

note: `pip install model2vec` is needed, but not for inference

note: `pip install model2vec` is needed, but not for inference

`v3.1.1`: - Patch hard negative mining & remove `numpy<2` restriction

Hard Negatives Mining Patch (#2944)