Skip to content

Erratum (EMNLP 2021 Paper)

Siyu Tao edited this page Jul 1, 2022 · 1 revision

Erratum

Donatelli, L., Schmidt, T., Biswas, D., Köhn, A., Zhai, F., & Koller, A. (2021). Aligning Actions Across Recipe Graphs. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 6930–6942. https://doi.org/10.18653/v1/2021.emnlp-main.554

This page provides a preliminary list of corrigenda for our paper Aligning Actions Across Recipe Graphs (EMNLP 2021) as we work towards a revised version of the paper for submission to ACL Anthology. We would especially like to thank Dhaivat Bhatt and Afsaneh Fazly at Samsung AI Research for their help in identifying the inconsistent reporting of results in our paper.

1. Evaluation Methodology

In §3.2 and §3.3, we compared our tagger and parser performance with those from [1] and listed the results in Table 1 and Table 2, respectively. However, we would like to note that whereas [1] used ten-fold validation, we randomly split their data into 238:29:30 (train:dev:test) and evaluated our tagger and parser on this single fold, since we do not have access to their folds. This is mentioned in the appendix A.1 of our paper, but should have been more prominently stated in the main body of the paper in the interest of clarity.

2. Data Splits

We make available here the trained models that we used to produce the results in the paper: https://drive.google.com/drive/folders/17CIqJpZffonPLlkQXKMsRnAhiDo4-_Z5

Unfortunately, we inadvertently re-split the data and no longer have access to the data splits used to train the above models and produce the exact results. Please refer to the repository for the results of the above models as evaluated on current data splits. While the results remain broadly comparable, we sincerely regret this oversight.

3. Model Pretraining

In §3.2, we observed that the tagger performed better with ELMo than with BERT as pre-trained embedders, contrary to expectations. However, we overlooked the fact that the ELMo weights we used had already been fine-tuned for NER on the CONLL-2003 dataset. This explains the “counterintuitive” results, as the BERT model we used was not fine-tuned for the NER task.

The ELMo NER model is from [2], available for download at https://allennlp.s3.amazonaws.com/models/ner-model-2018.12.18.tar.gz.

References

[1] Yamakata, Y., Mori, S., & Carroll, J. (2020). English Recipe Flow Graph Corpus. Proceedings of the 12th Language Resources and Evaluation Conference, 5187–5194. https://aclanthology.org/2020.lrec-1.638

[2] Peters, M. E., Ammar, W., Bhagavatula, C., & Power, R. (2017). Semi-supervised sequence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1756–1765. https://doi.org/10.18653/v1/P17-1161

Clone this wiki locally