Model parameters can be specified in a JSON configuration file, which is specified during training. JSON parameter names should match constructor parameters of a respective models.
-
Vanilla BERT (FirstP) ranker. This is a CEDR variant of FirstP ranker, which truncates long inputs and pads queries. The truncation length and the backbone (flavor of BERT) are all configurable so it can be used with models such as Longformer and Deberta V2/V3.
Nogueira, Rodrigo, and Kyunghyun Cho. "Passage Re-ranking with BERT." arXiv:1901.04085 (2019).
-
Dot-product models from the Sentence-BERT library. One can use any "dot-product" model from the Sentence Transformers library out of the box and fine-tune it on their data. The main ciation for this library is provided shortly, but make sure you also cite the model-specific citation if there is one: Reimers, Nils, et al. "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks."
-
COLBERT (v2) Re-ranking model A COLBERT (v2) model that can be used as an efficient re-ranker. It also has a strong zero-shot performance.
Various chunk-and-aggregate models for ranking of long documents, including:
-
PARADE models, original and improved.
Li, C., Yates, A., MacAvaney, S., He, B., & Sun, Y. (2020). PARADE: Passage representation aggregation for document reranking. arXiv:2008.09093.
Boytsov, L., Lin, T., Gao, F., Zhao, Y., Huang, J., & Nyberg, E. (2022). Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding. arXiv:2207.01262.
-
MacAvaney, Sean, et al. "CEDR: Contextualized embeddings for document ranking." SIGIR 2019.
-
Dai, Zhuyun, and Jamie Callan. "Deeper text understanding for IR with contextual neural language modeling." SIGIR. 2019.