Skip to content

Commit

Permalink
docs: Moved training results to results directory, updated docs and d…
Browse files Browse the repository at this point in the history
…escription (#187)

* Refactored results

* Refactored results

* Updated docs

* Updated docs

* Updated docs

* Updated description
  • Loading branch information
Pringled authored Feb 12, 2025
1 parent 5c205e7 commit 486e2bf
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 43 deletions.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
</div>

<div align="center">
<h2>The Fastest State-of-the-Art Static Embeddings in the World</h2>
<h2>Fast State-of-the-Art Static Embeddings</h2>
</div>

<div align="center">
Expand Down Expand Up @@ -103,7 +103,7 @@ from datasets import load_dataset
from model2vec.train import StaticModelForClassification

# Initialize a classifier from a pre-trained model
classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-8M")
classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-32M")

# Load a dataset
ds = load_dataset("setfit/subj")
Expand All @@ -120,7 +120,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com

## Updates & Announcements

- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and in our [blog post](LINK).
- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results).

- **30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available.

Expand All @@ -133,6 +133,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com
- **Lightweight Dependencies**: the base package's only major dependency is `numpy`.
- **Lightning-fast Inference**: up to 500 times faster on CPU than the original model.
- **Fast, Dataset-free Distillation**: distill your own model in 30 seconds on a CPU, without a dataset.
- **Fine-tuning**: fine-tune your own classification models on top of Model2Vec models.
- **Integrated in many popular libraries**: Model2Vec is integrated direclty into popular libraries such as [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [LangChain](https://github.com/langchain-ai/langchain). For more information, see our [integrations documentation](https://github.com/MinishLab/model2vec/blob/main/docs/integrations.md).
- **Tightly integrated with HuggingFace hub**: easily share and load models from the HuggingFace hub, using the familiar `from_pretrained` and `push_to_hub`. Our own models can be found [here](https://huggingface.co/minishlab).

Expand Down Expand Up @@ -173,6 +174,7 @@ We provide a number of models that can be used out of the box. These models are

We have performed extensive experiments to evaluate the performance of Model2Vec models. The results are documented in the [results](results/README.md) folder. The results are presented in the following sections:
- [MTEB Results](results/README.md#mteb-results)
- [Training Results](results/README.md#training-results)
- [Ablations](results/README.md#ablations)

## License
Expand All @@ -185,7 +187,7 @@ If you use Model2Vec in your research, please cite the following:
```bibtex
@software{minishlab2024model2vec,
authors = {Stephan Tulkens and Thomas van Dongen},
title = {Model2Vec: The Fastest State-of-the-Art Static Embeddings in the World},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
url = {https://github.com/MinishLab/model2vec}
}
Expand Down
36 changes: 0 additions & 36 deletions model2vec/train/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,42 +92,6 @@ pipeline = StaticModelPipeline.from_pretrained("my_cool/project")

Loading pipelines in this way is _extremely_ fast. It takes only 30ms to load a pipeline from disk.

# Results

The main results are detailed in our training blogpost, but we'll do a comparison with vanilla model2vec here. In a vanilla model2vec classifier, you just put a scikit-learn `LogisticRegressionCV` on top of the model encoder. In contrast, training a `StaticModelForClassification` fine-tunes the full model, including the `StaticModel` weights. The Setfit model is trained on using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model.

We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit).

| dataset | model2vec + logreg | model2vec full finetune | setfit |
|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:|
| 20_newgroups | 56.24 | 57.94 | 61.29 |
| ade | 79.2 | 79.68 | 83.05 |
| ag_news | 86.7 | 87.2 | 88.01 |
| amazon_counterfactual | 90.96 | 91.93 | 95.51 |
| bbc | 95.8 | 97.21 | 96.6 |
| emotion | 65.57 | 67.11 | 72.86 |
| enron_spam | 96.4 | 96.85 | 97.45 |
| hatespeech_offensive | 83.54 | 85.61 | 87.69 |
| imdb | 85.34 | 85.59 | 86 |
| massive_scenario | 82.86 | 84.42 | 83.54 |
| senteval_cr | 77.03 | 79.47 | 86.15 |
| sst5 | 32.34 | 37.95 | 42.31 |
| student | 83.2 | 85.02 | 89.62 |
| subj | 89.2 | 89.85 | 93.8 |
| tweet_sentiment_extraction | 64.96 | 62.65 | 75.15 |

| | logreg | full finetune | setfit
|:---------------------------|-----------:|---------------:|-------:|
| average | 77.9 | 79.2 | 82.6 |

As you can see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model.

The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU.

| | logreg | full finetune | setfit
|:---------------------------|-----------:|---------------:|-------:|
| samples / second | 17925 | 24744 | 716 |


# Bring your own architecture

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "model2vec"
description = "The Fastest State-of-the-Art Static Embeddings in the World"
description = "Fast State-of-the-Art Static Embeddings"
readme = { file = "README.md", content-type = "text/markdown" }
license = { file = "LICENSE" }
requires-python = ">=3.9"
Expand Down
48 changes: 46 additions & 2 deletions results/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Results

This page contains the experiments results of the Model2Vec project. The results are presented in the following sections:
This document contains the results of the Model2Vec project. The results are presented in the following sections:
- [MTEB Results](#mteb-results)
- [Training Results](#training-results)
- [Ablations](#ablations)

## MTEB Results
Expand Down Expand Up @@ -51,7 +52,7 @@ NOTE: for fairness of comparison, we disabled multiprocessing for Model2Vec for
|*Figure: The average MTEB score plotted against sentences per second. The circle size indicates model size.*|


## Retrieval Results
### Retrieval Results

A subset of models we created and compare against are specifically designed for retrieval tasks. The results are shown in the table below, including two general-purpose models for comparison and a transformer.

Expand All @@ -65,6 +66,49 @@ A subset of models we created and compare against are specifically designed for

As can be seen, [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) model is the most performant static retrieval model, reaching 86.65%% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with a retrieval score of 36.35.

## Training Results

The main results for Model2Vec training are outlined in this section.

We compare three different architectures:
- `model2vec + logreg`: A model2vec model with a scikit-learn `LogisticRegressionCV` on top.
- `model2vec full finetune`: A model2vec classifier with the full model finetuned. This uses our `StaticModelForClassification`.
- `setfit`: A [SetFit](https://github.com/huggingface/setfit/tree/main) model trained using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model.

We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit).

| dataset | model2vec + logreg | model2vec full finetune | setfit |
|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:|
| 20_newgroups | 56.24 | 57.94 | 61.29 |
| ade | 79.2 | 79.68 | 83.05 |
| ag_news | 86.7 | 87.2 | 88.01 |
| amazon_counterfactual | 90.96 | 91.93 | 95.51 |
| bbc | 95.8 | 97.21 | 96.6 |
| emotion | 65.57 | 67.11 | 72.86 |
| enron_spam | 96.4 | 96.85 | 97.45 |
| hatespeech_offensive | 83.54 | 85.61 | 87.69 |
| imdb | 85.34 | 85.59 | 86 |
| massive_scenario | 82.86 | 84.42 | 83.54 |
| senteval_cr | 77.03 | 79.47 | 86.15 |
| sst5 | 32.34 | 37.95 | 42.31 |
| student | 83.2 | 85.02 | 89.62 |
| subj | 89.2 | 89.85 | 93.8 |
| tweet_sentiment_extraction | 64.96 | 62.65 | 75.15 |

| | logreg | full finetune | setfit
|:---------------------------|-----------:|---------------:|-------:|
| average | 77.9 | 79.2 | 82.6 |

As can be seen see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model.

The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU.

| | logreg | full finetune | setfit
|:---------------------------|-----------:|---------------:|-------:|
| samples / second | 17925 | 24744 | 716 |



## Ablations

To better understand the factors contributing to the performance of Model2Vec, we conducted a comprehensive set of ablation studies, covering various aspects of the model's architecture and preprocessing methods. In these studies, we examined the impact of key elements such as PCA, Zipf weighting, and the use of Sentence Transformers versus regular transformer models. We also compared the performance of input embeddings versus output embeddings, since it would seem plausible that these should also work well. The results are shown in the table below.
Expand Down

0 comments on commit 486e2bf

Please sign in to comment.