HuCommitmentBank is a corpus of naturally occurring discourses whose final sentence contains a clause-embedding predicate under an entailment canceling operator. This dataset is also part of the Hungarian Language Understanding Evaluation Benchmark Kit HuLU.
It was designed based on the CommitmentBank Corpus (de Marneffe et al., 2019). The data collection and annotation process was done in the master thesis of Péter Hatvani (A Corpus to Investigate Projection Methods: The Hungarian Commitment Bank. Szakdolgozat, Pázmány Péter Katolikus Egyetem, Bölcsészet- és Társadalomtudományi Kar, Angol-Amerikai Intézet, Elméleti Nyelvészet Tanszék). 4 annotators collected a total of 1100 valid text fragments from MNSZ2 (Oravecz et al., 2014). The 1100 examples were labelled by 5-5 annotators on a 7-point Likert scale (from -3 to 3, where 0 meant that the speaker could not decide whether the speaker thinks the subordinate clause to be true or false). A total of 9 native Hungarian annotators worked on the corpus.
As in SuperGLUE, HuLU also integrates this corpus as an inference task. The 7-degree label has been replaced by a three-way classification: the original labels -1, 0 and 1 are "neutral", -3 and -2 are "contradiction", and 2 and 3 were compressed into the "entailment" category.
For each instance, there is an id, a premise, a hypothesis and a label.
An example:
{
"id": "0",
"premise": "Ha ezt nem erted akkor teljesen felesleges minden vita. hogy a valóban meggyőző érveid dacára mégiscsak volt a világban egy olyan kísérlet,kedves Derek, - csak hát az itt a bökkenő, Nagy igazságokat írsz a szocdem mozgalomról, amelyben párszázmillióan vettek részt, és amely kísérlet eredménye közismert ugye. Lehet, hogy neked ez nem tetszik, de képzeld, lehet, hogy nekem se tetszik), de volt.",
"hypothesis": "A beszélő szerint nem tetszik neki sem.",
"label": "entailment"
}
-
id: unique id of the instances;
-
premise: the premise, containing one embedded clause under an entailment-cancelling operator
-
hypothesis: the hypothesis;
-
label: "entailment" if the hypothesis is entailed by the premise, "contradiction" if the hypothesis contradicts the premise, and "neutral" otherwise.
The data is distributed in three splits: training set (250), development set (103) and test set (250). Only instances of Hatvani's dataset with standard deviation < 1 are included in HuLU.
The test set is distributed without labels. To evaluate your model, please contact us, or check HuLU's website for an automatic evaluation.
HuCommitmentBank is released under the CC-BY-SA-4.0 License.
If you use this resource or any part of its documentation, please refer to:
Noémi Ligeti-Nagy, Gergő Ferenczi, Enikő Héja, László János Laki, Noémi Vadász, Zijian Győző Yang, and Tamás Váradi. 2024. HuLU: Hungarian Language Understanding Benchmark Kit. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8360–8371, Torino, Italia. ELRA and ICCL.
@inproceedings{ligeti-nagy-etal-2024-hulu-hungarian,
title = "{H}u{LU}: {H}ungarian Language Understanding Benchmark Kit",
author = "Ligeti-Nagy, No{\'e}mi and
Ferenczi, Gerg{\H{o}} and
H{\'e}ja, Enik{\H{o}} and
Laki, L{\'a}szl{\'o} J{\'a}nos and
Vad{\'a}sz, No{\'e}mi and
Yang, Zijian Gy{\H{o}}z{\H{o}} and
V{\'a}radi, Tam{\'a}s",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.733",
pages = "8360--8371",
}
and to:
Ligeti-Nagy, N., Héja, E., Laki, L. J., Takács, D., Yang, Z. Gy. and Váradi, T. (2023) Hát te mekkorát nőttél! - A HuLU első életéve új adatbázisokkal és webszolgáltatással [Look at how much you have grown! - The first year of HuLU with new databases and with webservice]. In: Berend, G., Gosztolya, G. and Vincze, V. (eds), XIX. Magyar Számítógépes Nyelvészeti Konferencia. Szeged, Szegedi Tudományegyetem, Informatikai Intézet. To appear.
@inproceedings{ligetihulu2023,
author = "Ligeti-Nagy, Noémi and Héja, Enikő and Laki, László János and Takács, Dávid and Yang, Zijian {\relax Gy}őző and Váradi, Tamás",
title = "Hát te mekkorát nőttél! - A HuLU első életéve új adatbázisokkal és webszolgáltatással [Look at how much you have grown! - The first year of HuLU with new databases and with webservice]",
booktitle = "XIX. Magyar Számítógépes Nyelvészeti Konferencia",
year = "2023",
editor = "Berend, Gábor and Gosztolya, Gábor and Vincze, Veronika",
address = "Szeged",
publisher = "Szegedi Tudományegyetem, Informatikai Intézet",
note = "To appear."
}