Chinese word segmentation is the task of splitting Chinese text (a sequence of Chinese characters) into words.
Example:
'上海浦东开发与建设同步' → ['上海', '浦东', '开发', ‘与', ’建设', '同步']
♠ marks the system that uses character unigram as input. ♣ marks the systme that uses character bigram as input.
- Yang et al. (2018): Lattice LSTM-CRF + BPE subword embeddings ♠♣
- Ma et al. (2018): BiLSTM-CRF + hyper-params search♠♣
- Yang et al. (2017): Transition-based + Beam-search + Rich pretrain♠♣
- Zhou et al. (2017): Greedy Search + word context♠
- Chen et al. (2017): BiLSTM-CRF + adv. loss♠♣
- Cai et al. (2017): Greedy Search+Span representation♠
- Kurita et al. (2017): Transition-based + Joint model♠
- Liu et al. (2016): neural semi-CRF♠
- Cai and Zhao (2016): Greedy Search♠
- Chen et al. (2015a): Gated Recursive NN♠♣
- Chen et al. (2015b): BiLSTM-CRF♠♣
F1-score
Model | F1 | Paper / Source | Code |
---|---|---|---|
Ma et al. (2018) | 96.7 | State-of-the-art Chinese Word Segmentation with Bi-LSTMs | |
Yang et al. (2018) | 96.3 | Subword Encoding in Lattice LSTM for Chinese Word Segmentation | Github |
Yang et al. (2017) | 96.2 | Neural Word Segmentation with Rich Pretraining | Github |
Zhou et al. (2017) | 96.2 | Word-Context Character Embeddings for Chinese Word Segmentation | |
Chen et al. (2017) | 96.2 | Adversarial Multi-Criteria Learning for Chinese Word Segmentation | Github |
Liu et al. (2016) | 95.5 | Exploring Segment Representations for Neural Segmentation Models | Github |
Chen et al. (2015b) | 96.0 | Long Short-Term Memory Neural Networks for Chinese Word Segmentation | Github |
Model | F1 | Paper / Source | Code |
---|---|---|---|
Ma et al. (2018) | 96.6 | State-of-the-art Chinese Word Segmentation with Bi-LSTMs | |
Kurita et al. (2017) | 96.2 | Neural Joint Model for Transition-based Chinese Syntactic Analysis |
Model | F1 | Paper / Source | Code |
---|---|---|---|
Ma et al. (2018) | 96.2 | State-of-the-art Chinese Word Segmentation with Bi-LSTMs | |
Yang et al. (2017) | 95.7 | Neural Word Segmentation with Rich Pretraining | Github |
Cai et al. (2017) | 95.3 | Fast and Accurate Neural Word Segmentation for Chinese | Github |
Chen et al. (2017) | 94.8 | Adversarial Multi-Criteria Learning for Chinese Word Segmentation | Github |
Model | F1 | Paper / Source | Code |
---|---|---|---|
Ma et al. (2018) | 97.2 | State-of-the-art Chinese Word Segmentation with Bi-LSTMs | |
Yang et al. (2017) | 96.9 | Neural Word Segmentation with Rich Pretraining | Github |
Cai et al. (2017) | 95.6 | Fast and Accurate Neural Word Segmentation for Chinese | Github |
Chen et al. (2017) | 95.6 | Adversarial Multi-Criteria Learning for Chinese Word Segmentation | Github |
Model | F1 | Paper / Source | Code |
---|---|---|---|
Yang et al. (2017) | 96.3 | Neural Word Segmentation with Rich Pretraining | Github |
Ma et al. (2018) | 96.1 | State-of-the-art Chinese Word Segmentation with Bi-LSTMs | |
Yang et al. (2018) | 95.9 | Subword Encoding in Lattice LSTM for Chinese Word Segmentation | Github |
Cai et al. (2017) | 95.8 | Fast and Accurate Neural Word Segmentation for Chinese | Github |
Chen et al. (2017) | 94.3 | Adversarial Multi-Criteria Learning for Chinese Word Segmentation | Github |
Liu et al. (2016) | 95.7 | Exploring Segment Representations for Neural Segmentation Models | Github |
Cai and Zhao (2016) | 95.7 | Neural Word Segmentation Learning for Chinese | Github |
Model | F1 | Paper / Source | Code |
---|---|---|---|
Ma et al. (2018) | 98.1 | State-of-the-art Chinese Word Segmentation with Bi-LSTMs | |
Yang et al. (2018) | 97.8 | Subword Encoding in Lattice LSTM for Chinese Word Segmentation | Github |
Yang et al. (2017) | 97.5 | Neural Word Segmentation with Rich Pretraining | Github |
Cai et al. (2017) | 97.1 | Fast and Accurate Neural Word Segmentation for Chinese | Github |
Chen et al. (2017) | 96.0 | Adversarial Multi-Criteria Learning for Chinese Word Segmentation | Github |
Liu et al. (2016) | 97.6 | Exploring Segment Representations for Neural Segmentation Models | Github |
Cai and Zhao (2016) | 96.4 | Neural Word Segmentation Learning for Chinese | Github |