Implementation of the EMNLP 2020 paper "Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition".
We propose the Counterfactual Generator, which generates counterfactual examples by the interventions on the existing observational examples to enhance the original dataset by breaking the entangled correlations between the spurious features and the non-spurious fearures. Experiments across three datasets show that our method improves the generalization ability of models under limited observational examples.
sudo apt install python3-pip
pip3 install virtualenv
virtualenv -p /usr/bin/python3 env
source env/bin/activate
pip install -r requirements.txt
It is important to know that we do not provide the link or data of the dataset IDiag
, because it is a private dataset only for the INTERNAL USE.
-
Train a model on a given dataset
python app.py train --gpu "[0]" --model "bilstm" --dataset "cluener" --seed 100
-
Training with all settings
python app.py trainall --gpu "[0]" --models "['bilstm', 'bert']" --datasets "['cluener', 'cner']"
@inproceedings{zeng-etal-2020-counterfactual,
title = "Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition",
author = "Zeng, Xiangji and
Li, Yunliang and
Zhai, Yuchen and
Zhang, Yin",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-main.590",
pages = "7270--7280",
}
CFGen is CC-BY-NC 4.0.
If you have any questions, please contact me [email protected] or create a Github issue.