The SloATOMIC 2020 contains the Slovene translated examples of the ATOMIC 2020 data set. The translation was done using the DeepL translation service.
The main purpose of the data set is to train Slovene commonsense reasoning models.
The data set is publically available via the clarin.si repository.
The data set is available in the data folder which contains the following files:
- sloatomic_train.tsv: The training set.
- sloatomic_dev.tsv: The development set.
- sloatomic_test.tsv.automatic_all: The test set containing all of the automatically translated examples.
- sloatomic_test.tsv.automatic_10k: The selection of 10k examples from the complete test set.
- sloatomic_test.tsv.manual_10k: The manually inspected and fixed examples of the automatic 10k subset.
The data is in the tsv (tab-separated) format. Each line contains one example. The columns are:
- head_event: The head event of the example.
- relation: The relation between the head event and the tail event.
- tail_event: The tail event of the example.
The data set was used in the following papers:
- SLOmet - Slovenian Commonsense Description. Adrian Mladenić Grobelnik, Erik NOvak, Dunja Mladenić, Marko Grobelnik SiKDD Slovenian KDD Conference, 2022.
If the data set was used for your research, please provide the following reference:
@misc{11356/1724,
title = {Slovene Translation of the Atomic 2020 data set {SloATOMIC} 2020},
author = {Mladeni{\'c} Grobelnik, Adrian and Novak, Erik and Mladeni{\'c}, Dunja and Grobelnik, Marko},
url = {http://hdl.handle.net/11356/1724},
note = {Slovenian language resource repository {CLARIN}.{SI}},
copyright = {Creative Commons - Attribution-{ShareAlike} 4.0 International ({CC} {BY}-{SA} 4.0)},
issn = {2820-4042},
year = {2022}
}
This work is developed by Department of Artificial Intelligence at Jozef Stefan Institute.
The work is supported by the Slovenian Research Agency and the RSDO project.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.