Chunking

Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

Vinken	,	61	years	old
B-NLP	I-NP	I-NP	I-NP	I-NP

Penn Treebank

The Penn Treebank is typically used for evaluating chunking. Sections 15-18 are used for training, section 19 for development, and and section 20 for testing. Models are evaluated based on F1.

Model	F1 score	Paper / Source
JMT (Hashimoto et al., 2017)	95.77	A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Low supervision (Søgaard and Goldberg, 2016)	95.57	Deep multi-task learning with low level tasks supervised at lower layers
Suzuki and Isozaki (2008)	95.15	Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data
NCRF++ (Yang and Zhang, 2018)	95.06	NCRF++: An Open-source Neural Sequence Labeling Toolkit

Go back to the README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunking.md

chunking.md

Chunking

Penn Treebank

Files

chunking.md

Latest commit

History

chunking.md

File metadata and controls

Chunking

Penn Treebank