SPC, A Semantic Pleonasm Corpus

Semantic Pleonasm Corpus (SPC), is a collection of three thousand sentences. Each sentence features a pair of potentially semantically related words (chosen by a heuristic); human annotators determine whether either (or both) of the words is redundant. The corpus offers two improvements over current resources:

First, the corpus filters for grammatical sentences so that the question of redundancy is separated from grammaticality.
Second, the corpus is filtered for a balanced set of positive and negative examples (i.e., no redundancy).

The negative examples may make useful benchmark data – because they all contain a pair of words that are deemed to be semantically related, a successful system cannot rely on simple heuristics, such as semantic distances, for discrimination.

Publication

Omid Kashefi, Andrew T. Lucas, Rebecca Hwa, Semantic Pleonasm Detection. Proceedings of the NAACL-HLT, pp. 225--230, New Orleans, LA, 2018. [bibtex]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SPC		SPC
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPC, A Semantic Pleonasm Corpus

Publication

About

Releases

Packages

License

omidkashefi/pleonasm

Folders and files

Latest commit

History

Repository files navigation

SPC, A Semantic Pleonasm Corpus

Publication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages