forked from phelrine/stakk
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
173 lines (142 loc) · 5.69 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
StaKK: Statistical Kana Kanji conversion engine
StaKK is Japanese Language Processer with following features.
* Current Features
Kana-Kanji converter
Predictive Input or Query Suggestion
Spelling Correction
Morphological Analyzer
HTTP API Server
Raw Trie Operations
Currently, StaKK uses dictionary of Mozc, OSS version of Google Japanese IME.
* Reverse Mode
StaKK implements two mode: normal mode and reverse mode.
Normal Mode: Input Reading or Kana, Output Word or Kanji.
Reverse Mode: Input Word or Kanji, Output Reading or POS tag etc.
These two methods can be applied for the purpos in the following table.
| Method | Normal Mode | Reverse Mode |
| Convert | Kana-Kanji Conversion | Morphological Analyze |
| Predict | Predictive Input | Query Suggestion |
| Spell Correct | Correct with Reading | Correct with Surface |
* Usage
Common Options:
command [-r] [-d dictionary] [-c conneciton]
-d dictionary: mozc format dictionary file path. default: data/dictinoary.txt
-c connection: mozc format connection file path. default: data/connection.txt
-i id.def: mozc format class id definition file path. default: data/id.def
This option is not enable for stakk_test command.
-r: "Reverse" option for morphological analyzer.
If reverse option is set, input word surface instead of reading.
Command Specific Options:
converter_test [-i id.def] [-o mecab|wakati|yomi]
Execute Kana-Kanji Conversion or Morphological Analyze on Command Line.
-i id.def: Mozc format class id definition file path. default: data/id.def
-o output: Output format: one of "mecab", "wakati", "yomi". default: mecab
stakk_test [-m predict|spell] [-t threshold]
Execute Prediction or Spelling Correction on Command Line.
-m mode: Retrieval mode: one of "predict", "spell". default: spell
-t threshold: Spelling correction threhsold of edit distance. default: 1
stakk_server [-i id.def] [-p port]
Start HTTP API Server.
-p port: Port number of HTTP API server.
HTTP API Server Path:
General Format
/apiname/method/query[options]
apiname: Any name is acceptable. recommend: api
method: One of "convert", "predict", "spell".
query: Input Query of Japanese String.
Convert Method
/apiname/convert/query/format
format: One of "wakati", "mecab", "yomi". default: wakati
Predict Method
/apiname/predict/query/number
number: Maximum Display Number of Candidates. default: 50
Spell Method
/apiname/predict/query/threshold/number
threshold: Threshold of Edit Distance. default: 1
DON'T set this larger than 2, which will down the server.
number: Maximum Display Number of Candidates. default: 50
* Developer Environment
CentOS 5.5
gcc 4.1.2
* Version
v1.0 2010/11/23 first release.
* Examples
# Compilation
$ make
# Kana-Kanji Conversion
$ ./converter_test -o wakati
loading dictionary
loading connection
loading id definition
input query:
わたしのなまえはなかのです。
私 の 名前 は 中野 です 。
# Predictive Input
$ ./stakk_test -m predict
loading dictionary
loading connection
input query:
あり (Input)
ありがとう ありがとう
ありがとう 有難う
ありがと ありがと
ありあんつかさいかいじょうほけん アリアンツ火災海上保険
ありえってぃ アリエッティ
etc.
# Spelling Correction
$ ./stakk_test -m spell
loading dictionary
loading connection
input query:
れみおめろん (Input)
れみおろめん レミオロメン
あみおだろん アミオダロン
ぐーgる (Input)
ぐーぐる グーグル
ぐーる グール
etc.
# Morphological Analyzer
$ ./converter_test -r -o mecab
loading dictionary
loading connection
loading id definition
input query:
東京都に住む
東京都 とうきょうと 名詞,固有名詞,地域,一般,*,*,都名 名詞,接尾,地域,*,*,*,*
に に 助詞,格助詞,一般,*,*,*,に 助詞,格助詞,一般,*,*,*,に
住む すむ 動詞,自立,*,*,五段動詞,基本形,5 動詞,自立,*,*,五段動詞,基本形,5
EOS
# HTTP API Server
$ ./stakk_server -p 50000 &
loading dictionary
loading connection
loading id definition
server ready
$ curl "http://localhost:50000/api/convert/わたしのなまえはなかのです。"
私 の 名前 は 中野 です 。
$ curl "http://localhost:50000/api/predict/きょうの"
きょうのうんせい 今日の運勢
きょうのてんき 今日の天気
きょうのばんぐみ 今日の番組
きょうのひとこと 今日の一言
etc.
$ curl "http://localhost:50000/api/spell/れみおめろん"
れみおろめん レミオロメン
$ curl "http://localhost:50000/api/spell/れみおめろん/2"
れみおろめん レミオロメン
あみおだろん アミオダロン
# HTTP API Server with Reverse Mode
$ ./stakk_server -rp 50001 &
loading dictionary
loading connection
loading id definition
server ready
$ curl "http://localhost:50001/api/convert/東京都に住む/mecab"
東京都 とうきょうと 名詞,固有名詞,地域,一般,*,*,都名 名詞,接尾,地域,*,*,*,*
に に 助詞,格助詞,一般,*,*,*,に 助詞,格助詞,一般,*,*,*,に
住む すむ 動詞,自立,*,*,五段動詞,基本形,5 動詞,自立,*,*,五段動詞,基本形,5
EOS
$ curl "http://localhost:50001/api/spell/テソション"
てんしょん テンション
ていしょん テイション
てーしょん テーション