Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 745 Bytes

readme.md

File metadata and controls

25 lines (21 loc) · 745 Bytes

readme for MSRA dataset

Task

Named Entity Recognition

Description

Tags: LOC(地名), ORG(机构名), PER(人名)
Tag Strategy:BIO
Split: '\t' (北\tB-LOC)
Data Size:
Train data set ( msra_train_bio.txt ):

句数 字符数 LOC数 ORG数 PER数
45000 2171573 36860 20584 17615

Test data set ( msra_test_bio.txt )

句数 字符数 LOC数 ORG数 PER数
3442 172601 2886 1331 1973

Reference:
The third international Chinese language processing bakeoff: Word segmentation and named entity recognition
https://github.com/dox1994/nlp_datasets