Skip to content

Latest commit

 

History

History
170 lines (121 loc) · 6.68 KB

README.md

File metadata and controls

170 lines (121 loc) · 6.68 KB

COFIE: COVID-19 Open Functional Information Extraction

This repository contains models, datasets and experiments described in Extracting a Knowledge Base of Mechanisms from COVID-19 Papers.

  • Please cite our paper if you use our datasets or models in your project. See the BibTeX.
  • Feel free to email us.

COFIE / COFIE-G datasets

We provide two annotated datasets:

  • COFIE: Coarse-grained mechanism relations (Direct and Indirect)
  • COFIE-G: Granular mechanism relations (Subject-Predicate-Object)

From project root, run scripts/data/get_cofie.sh to download both datasets to the data directory.

  • COFIE will be downloaded to data/cofie/[train,dev,test].json. Development and test sets for are also available in tabular format: data/cofie-gold/[dev,test]-gold.tsv
  • COFIE-G will be downloaded to data/cofie-g/split/[train,dev,test].json. Tabular format:data/cofie-g-gold/[dev,test]-gold.tsv

Pretrained models

We provide models pre-trained on COFIE and COFIE-G.

Downloads

From project root, run scripts/pretrained/get_cofie_pretrained.sh to download all the available pretrained models to the pretrained directory. If you only want one model, here are the download links.

Table of Contents

Dependencies

This code repository is forked from DYGIE++, Wadden 2019.

This code was developed using Python 3.7. To create a new Conda environment using Python 3.7, do conda create --name cofie python=3.7.

This library relies on AllenNLP and uses AllenNLP shell commands to kick off training, evaluation, and testing.

We use the Allentune for hyperparameter search. For installing a compatible version of the Allentune library, please download the allentune git repo outside of dygiepp directory using:

git clone https://github.com/allenai/allentune.git

Then replace the files provided in this repository using command

cp -r allentune_files/[location of downloaded allentune]

The you can proceed with installing allentune by running

pip install --editable .

in allentune downloaded folder.

After installing allentune please proceed with installing required libraries for DyGIE++. The necessary dependencies can be installed with

pip install -r requirements.txt

Making predictions on existing datasets

To make a prediction, you can use allennlp predict. For example, to make a prediction with a pretrained granular relation model:

allennlp predict pretrained/ternary-model.tar.gz \
    data/cofie-g/split/test.json \
    --predictor dygie \
    --include-package dygie \
    --use-dataset-reader \
    --output-file predictions/cofie-g-test.jsonl \
    --cuda-device 0 \
    --silent

For predicting coarse relations using a pretrained model:

allennlp predict pretrained/binary.tar.gz \
    data/cofie/test.json \
    --predictor dygie \
    --include-package dygie \
    --use-dataset-reader \
    --output-file predictions/cofie-test.jsonl \
    --cuda-device 0 \
    --silent

Running these commands will provide json-formatted predictions.

Alternatively you can use the predict scripts provided by this library to generate both .tsv and .json file. You can use :

python predict_binary.py --data_dir data/cofie --device 0 --serial_dir pretrained/binary-model.tar.gz  --pred_dir predictions/cofie-test/

for coarse relation predictions and

python predict_ternary.py --data_dir data/cofie-g/collated --device 0 --serial_dir pretrained/ternary-model.tar.gz  --pred_dir predictions/cofie-t-test/

for granular relation predictions.

Relation extraction evaluation metric

We report Precision/Recall/F1 measured by using exact and partial span-matching functions. Full details are described in our paper.

Training with Allentune

We use Allentune for hyperparameter tuning. To train a model for coarse relation extraction using Allentune, you can run the script below.

python scripts/train/train_allentune.py --data_dir data/cofie --device 0,1,2,3 --serial_dir models/cofie/ --gpu_count 4 --cpu_count 12 --device 0,1,2,3

To train the model for granular relations:

python scripts/train/train_event_allentune.py --data_dir data/processed/collated_events/ --serial_dir ./models/events --gpu_count 4 --cpu_count 12 --device 0,1,2,3

The default number of training samples is set to 30. For more training options please use the --h command.

To obtain predictions for the development set over all Allentune runs:

python predict.py --data_dir data/cofie --device 0 --serial_dir models/cofie/

for the coarse relation model and

python predict_event_allentune.py --serial_dir ./models/cofie-t --data_dir ./data/cofie-t/ --pred_dir ./predictions/cofie-t

for the granular relation model.

You can get test set predcitions by indicating only the run index you want to use for inference:

python predict.py --data_dir data/cofie --device 0,1,2,3 --serial_dir models/cofie/  --pred_dir predictions/cofie

for coarse relations and

python predict_event_allentune.py --serial_dir ./models/cofie-t --data_dir ./data/cofie-t/ --pred_dir ./predictions/cofie-t --test_data --test_index 17

for granular relations.

Citation

If using our dataset and models, please cite:

@inproceedings{amini-hope-2020-cofie,
    title={{Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
}},
    author={Tom Hope and Aida Amini and David Wadden and Madeleine van Zuylen and E. Horvitz and Roy Schwartz and Hannaneh Hajishirzi},
    year={2020},
    url={https://arxiv.org/pdf/2010.03824.pdf}
}

Contact us

Please don't hesitate to reach out.

Email: [email protected], [email protected]