Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
fkxxyz committed Jun 12, 2020
0 parents commit e87060b
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
词典360万(个人整理).txt filter=lfs diff=lfs merge=lfs -text
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
作者:刘邵博 版本:v1
此词典为个人综合多本词典整合的一个大词典,词典共有词汇3669216个词汇。
词典结构为:词语\t词性\t词频。
词频是用ansj分词对270G新闻语料进行分词统计词频获得。
本人感觉需要特别说明的是词典整理过程中存在部分词汇无法确定是什么词性,对词性进行特别标注:nw和comb
1、词性nw表示本身不知道是什么词性。
2、词性comb表示通过ansj的nlp分词之后又被拆成了两个词。

官网:http://www.nlpcn.org

*********************************************************************************************************
3 changes: 3 additions & 0 deletions 词典360万(个人整理).txt
Git LFS file not shown
10 changes: 10 additions & 0 deletions 词典说明.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
���ߣ����۲� �汾��v1
�˴ʵ�Ϊ�����ۺ϶౾�ʵ����ϵ�һ����ʵ䣬�ʵ乲�дʻ�3669216���ʻ㡣
�ʵ�ṹΪ������\t����\t��Ƶ��
��Ƶ����ansj�ִʶ�270G�������Ͻ��зִ�ͳ�ƴ�Ƶ��á�
���˸о���Ҫ�ر�˵�����Ǵʵ����������д��ڲ��ִʻ��޷�ȷ����ʲô���ԣ��Դ��Խ����ر��ע��nw��comb
1������nw��ʾ������֪����ʲô���ԡ�
2������comb��ʾͨ��ansj��nlp�ִ�֮���ֱ�����������ʡ�

������http://www.nlpcn.org
*********************************************************************************************************

0 comments on commit e87060b

Please sign in to comment.