SPLADE

This is the experiment files for SPLADE models, developed within experimaestro-ir (XPMIR). The papers are described in Towards Effective and Efficient Sparse Neural Information Retrieval:

BibTex record

```bibtex
@article{10.1145/3634912,
author = {Formal, Thibault and Lassance, Carlos and Piwowarski, Benjamin and Clinchant, St\'{e}phane},
title = {Towards Effective and Efficient Sparse Neural Information Retrieval},
year = {2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1046-8188},
url = {https://doi.org/10.1145/3634912},
doi = {10.1145/3634912},
abstract = {Sparse representation learning based on Pre-trained Language Models has seen a growing interest in Information Retrieval. Such approaches can take advantage of the proven efficiency of inverted indexes, and inherit desirable IR priors such as explicit lexical matching or some degree of interpretability. In this work, we thoroughly develop the framework of sparse representation learning in IR, which unifies term weighting and expansion in a supervised setting. We then build on SPLADE – a sparse expansion-based retriever – and show to which extent it is able to benefit from the same training improvements as dense bi-encoders, by studying the effect of distillation, hard negative mining as well as the Pre-trained Language Model’s initialization on its effectiveness – leading to state-of-the-art results in both in- and out-of-domain evaluation settings (SPLADE++). We furthermore propose efficiency improvements, allowing us to reach latency requirements on par with traditional keyword-based approaches (Efficient-SPLADE).},
note = {Just Accepted},
journal = {ACM Trans. Inf. Syst.},
month = {dec},
keywords = {Sparse Representations, Information Retrieval, Efficiency, Effectiveness}
}
```

Run it

# See below for the list of possible
experimaestro run-experiment splade/NAME.yaml

Implemented models

splade/normal_DistilMSE: SPLADE-max with distillation (one of the best performing model)
splade/debug: SPLADE-max with distillation (one of the best performing model), just performing a few learning step and indexing a part of the documents

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
splade		splade
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPLADE

Run it

Implemented models

About

Releases

Packages

Languages

License

xpmir/splade

Folders and files

Latest commit

History

Repository files navigation

SPLADE

Run it

Implemented models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages