Codes for our paper Levi Graph AMR Parser using Heterogeneous Attention published to IWPT 2021: The 17th International Conference on Parsing Technologies.
This paper presents a novel approach to AMR parsing by combining heterogeneous data (tokens, concepts, labels) as one input to a transformer to learn attention, and use only attention matrices from the transformer to predict all elements in AMR graphs (concepts, arcs, labels). Although our models use significantly fewer parameters than the previous state-of-the-art graph parser, they show similar or better accuracy on AMR 2.0 and 3.0.
The env.sh
script will create an environment with all the dependencies. It installs torch==1.7.1+cu110
by default but other similar versions should also work.
source scripts/env.sh
Model | Smatch | Link |
---|---|---|
AMR2.0+Levi | 80.2 | amr2_levi_80.2.zip |
AMR3.0+Levi | 70.2 | amr3_levi_70.2.zip |
The following instruction (modified from AMR-gs) assumes that you're working on AMR 2.0 (LDC2017T10). For AMR 3.0, the procedure is similar.
-
unzip the corpus to
data/AMR/LDC2017T10
. -
Prepare training/dev/test splits:
sh scripts/prepare_data.sh -v 2 -p data/AMR/LDC2017T10
-
Download Artifacts:
sh scripts/download_artifacts.sh
-
Feature Annotation:
We use Stanford CoreNLP (version 3.9.2) for lemmatizing, POS tagging, etc.
sh scripts/run_standford_corenlp_server.sh sh scripts/annotate_features.sh data/AMR/amr_2.0
-
Data Preprocessing:
sh scripts/preprocess_2.0.sh
-
Building Vocabs
sh scripts/prepare_vocab.sh data/AMR/amr_2.0 true
sh scripts/train_joint.sh data/AMR/amr_2.0 # ND + AD + BD, aka cross-attention model
sh scripts/train_levi.sh data/AMR/amr_2.0 # ND + AD + LV, aka Levi model
The training process will produce many checkpoints and the corresponding output on dev set.
To select the best checkpoint during training, one can evaluate the dev output files using the following script. It uses the official Smatch script to evaluate the best checkpoint on testset too.
python amr_parser/eval_dev_test.py -h
To evaluate a checkpoint on testset individually, use the following script.
python amr_parser/eval_test.py -h
- We adopted the code snippets from stog for data preprocessing.
- The sanitizing script for unconnected Levi graphs is very preliminary, causing failures on the evaluation of some checkpoints sometimes.
- The online dbpedia-spotlight does not work now. Therefore, we recommend to run it locally.
wget https://od.hankcs.com/research/iwpt2021/spotlight.zip
unzip spotlight.zip
sh scripts/run_spotlight.sh
If you use this repository in your research, please kindly cite our IWPT2021 paper:
@inproceedings{he-choi:2021:iwpt,
Author = {He, Han and Choi, Jinho D.},
Booktitle = {Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies},
Publisher = {Association for Computational Linguistics},
Title = {Levi Graph AMR Parser using Heterogeneous Attention},
Year = {2021},
}