Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 1.06 KB

README.md

File metadata and controls

12 lines (7 loc) · 1.06 KB

SPC, A Semantic Pleonasm Corpus

Semantic Pleonasm Corpus (SPC), is a collection of three thousand sentences. Each sentence features a pair of potentially semantically related words (chosen by a heuristic); human annotators determine whether either (or both) of the words is redundant. The corpus offers two improvements over current resources:

  • First, the corpus filters for grammatical sentences so that the question of redundancy is separated from grammaticality.
  • Second, the corpus is filtered for a balanced set of positive and negative examples (i.e., no redundancy).

The negative examples may make useful benchmark data – because they all contain a pair of words that are deemed to be semantically related, a successful system cannot rely on simple heuristics, such as semantic distances, for discrimination.

Publication

Omid Kashefi, Andrew T. Lucas, Rebecca Hwa, Semantic Pleonasm Detection. Proceedings of the NAACL-HLT, pp. 225--230, New Orleans, LA, 2018. [bibtex]