Skip to content

Dataset and scripts for the DRASTIC corpus of DRS-annotated texts

License

Notifications You must be signed in to change notification settings

Universal-NLU/DRASTIC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The DRASTIC corpus

CC BY 4.0

Dataset and scripts for the DRASTIC corpus of DRS-annotated texts.

The repository has the following structure:

├── data
│   ├── drs-annotation
│   │   ├── anaphora-resolution
│   │   │   ├── dvorak
│   │   │   ├── marbles
│   │   │   ├── nida
│   │   │   └── short-texts
│   │   └── no-anaphora-resolution
│   │       ├── dvorak
│   │       ├── marbles
│   │       ├── nida
│   │       └── short-texts
│   └── ud-sources
│       ├── dvorak
│       ├── marbles
│       ├── nida
│       └── short-texts
└── scripts

data contains the drs-annotation, in a clausal format, as well as the corresponding ud-sources from the GUM corpus. The semantic annotations are given in two versions: one with sentence-internal anaphora resolved (anaphora-resolution) and one without (no-anaphora-resolution). Within each directory, the texts are divided by sub-corpus, and named for their corresponding UD sent_ids (for details, see Haug et al. 2023, referenced below).

scripts contains a script (flatten_clause_notation.py) which will 'flatten' PMB-style DRSs into our simplified format. We also provide a shell script (flatten_clause_notation_in_batch.sh) to run this on multiple files at once.

If you use this data, please cite the following paper:

Haug, Dag T. T., Jamie Y. Findlay and Ahmet Yıldırım. 2023. The long and the short of it: DRASTIC, a semantically annotated dataset containing sentences of more natural length. In Proceedings of the 4th International Workshop on Designing Meaning Representations (DMR 2023), 89–98. Association for Computational Linguistics.

  @inproceedings{haug_etal:drastic,
    title           = {The long and the short of it: \textsc{drastic}, a semantically annotated dataset containing sentences of more natural length},
    year            = {2023},
    author          = {Dag T. T. Haug and Jamie Y. Findlay and Ahmet Y\i{}ld\i{}r\i{}m},
    booktitle       = {{Proceedings of the 4th International Workshop on Designing Meaning Representations (DMR 2023)}},
    pages           = {89--98},
    publisher       = {Association for Computational Linguistics},
    url             = {https://aclanthology.org/2023.dmr-1.9}
  }

This data is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

About

Dataset and scripts for the DRASTIC corpus of DRS-annotated texts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published