Skip to content

Datasets

Nils Feldhus edited this page Jul 5, 2021 · 7 revisions

SQuAD

Name 🤗 Tested
BERT bert-large-uncased-whole-word-masking-finetuned-squad
DistilBERT distilbert-base-uncased-distilled-squad

SQuAD 2.0

Name 🤗 Tested
ALBERT mfeb/albert-xxlarge-v2-squad2
BERT deepset/bert-large-uncased-whole-word-masking-squad2
ELECTRA deepset/electra-base-squad2
MiniLM deepset/minilm-uncased-squad2
RoBERTa deepset/roberta-base-squad2
XLNet jkgrad/xlnet-base-squadv2

QQP

QQP is a paraphrase identification dataset of two classes, contains 390965 examples, and is part of the glue benchmark.

Name 🤗 lgxa lig lime occ svs
ALBERT textattack/albert-base-v2-QQP
BERT textattack/bert-base-uncased-QQP
ELECTRA howey/electra-base-qqp
XLNet textattack/xlnet-base-cased-QQP

TREC

trec is a question classification dataset with 6 classes.

Name 🤗 Tested
BERT aychang/bert-base-cased-trec-coarse
DistilBERT aychang/distilbert-base-cased-trec-coarse

SST-2

SST-2 is a sentiment analysis dataset with 2 classes and part of the glue benchmark.
There are no labels available for the test set.

Name 🤗 lgxa lig lime occ svs
ALBERT (albert) textattack/albert-base-v2-SST-2
BERT (bert) textattack/bert-base-uncased-SST-2
ELECTRA (electra) howey/electra-base-sst2
RoBERTa (roberta) textattack/roberta-base-SST-2
XLNet (xlnet) textattack/xlnet-base-cased-SST-2

HANS

Apply MNLI-trained models (already in-place)

PAWS

Apply QQP-trained models (see table above)

Clone this wiki locally