Skip to content

Latest commit

 

History

History
49 lines (28 loc) · 3.4 KB

File metadata and controls

49 lines (28 loc) · 3.4 KB

Word Similarity and Relatedness Datasets

This webpage provides a collection of datasets for benchmarking word similarity and relatedness. The datasets are described in:

Kliegr, Tomáš, and Ondřej Zamazal. Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353. Data & Knowledge Engineering 115 (2018): 174-193.

Link to the paper: Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353

SimLex999-Czech

WordSim353 reannotated

WordSim353 and SimLex999 disambiguations

Acknowledgments and license

These datasets are licensed under a Creative Commons Attribution 4.0 International License.

The English SimLex-999 word pairs and instructions are credited to

Hill, Felix, Roi Reichart, and Anna Korhonen. "Simlex-999: Evaluating semantic models with (genuine) similarity estimation." Computational Linguistics 41.4 (2015): 665-695. 

The English WordSim-353 word pairs and instructions are credited to

Finkelstein, Lev, et al. "Placing search in context: The concept revisited." Proceedings of the 10th international conference on World Wide Web. ACM, 2001. 

The Czech WordSim-353 word pairs and instructions are credited to

Cinková, Silvie. "WordSim353 for Czech." In International Conference on Text, Speech, and Dialogue, pp. 190-197. Springer, Cham, 2016. 

Agirre, Eneko, et al. "A study on similarity and relatedness using distributional and wordnet-based approaches." Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009. 

Website

More details at WIN-353 website.