Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.
Example:
Vinken | , | 61 | years | old |
---|---|---|---|---|
B-NLP | I-NP | I-NP | I-NP | I-NP |
The Penn Treebank is typically used for evaluating chunking. Sections 15-18 are used for training, section 19 for development, and and section 20 for testing. Models are evaluated based on F1.
Model | F1 score | Paper / Source |
---|---|---|
JMT (Hashimoto et al., 2017) | 95.77 | A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks |
Low supervision (Søgaard and Goldberg, 2016) | 95.57 | Deep multi-task learning with low level tasks supervised at lower layers |
Suzuki and Isozaki (2008) | 95.15 | Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data |
NCRF++ (Yang and Zhang, 2018) | 95.06 | NCRF++: An Open-source Neural Sequence Labeling Toolkit |