Skip to content

omidkashefi/pleonasm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

SPC, A Semantic Pleonasm Corpus

Semantic Pleonasm Corpus (SPC), is a collection of three thousand sentences. Each sentence features a pair of potentially semantically related words (chosen by a heuristic); human annotators determine whether either (or both) of the words is redundant. The corpus offers two improvements over current resources:

  • First, the corpus filters for grammatical sentences so that the question of redundancy is separated from grammaticality.
  • Second, the corpus is filtered for a balanced set of positive and negative examples (i.e., no redundancy).

The negative examples may make useful benchmark data – because they all contain a pair of words that are deemed to be semantically related, a successful system cannot rely on simple heuristics, such as semantic distances, for discrimination.

Publication

Omid Kashefi, Andrew T. Lucas, Rebecca Hwa, Semantic Pleonasm Detection. Proceedings of the NAACL-HLT, pp. 225--230, New Orleans, LA, 2018. [bibtex]

About

Semantic Pleonasm Corpus (SPC)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published