Skip to content

Latest commit

 

History

History
16 lines (10 loc) · 852 Bytes

README.md

File metadata and controls

16 lines (10 loc) · 852 Bytes

CATS

Commonsense Ability Tests

Dataset and script for paper Evaluating Commonsense in Pre-trained Language Models

Use making_sense.py to run the experiments:
For ordinary tests:
python making_sense.py ca bert nr

For robust tests:
python making_sense.py ca bert r

Note that ca is the name of the task and bert is the model we are using. The default model is bert-base-uncased. To use bert-large, just modify the from_pretrained('bert-base-uncased') in the code. For more details, see Huggingface Transformers.

Due to the updating of Huggingface scripts and some of our datasets, some numbers we showed in the paper may not exactly match the what you might get by rerunning the experiments. However, the conclusion should be the same.