mecabwrap is yet another Python interface to MeCab Morphological Analyzer.
Its goal is to provide intuitive APIs that work on Unix and Windows machines seamlessly.
- Python 2.7+ or 3.4+ (May also work on older versions)
- MeCab 0.996
Ubuntu
$ sudo apt-get install mecab libmecab-dev mecab-ipadic-utf8
Mac OSX
$ brew install mecab mecab-ipadic
Windows
Download and run the installer.
See also: official website
Install from PyPI
$ pip install mecabwrap
or, from GitHub
$ git clone --depth 1 https://github.com/kota7/mecabwrap-py.git
$ cd mecabwrap-py
$ pip install -U .
Following command will print the MeCab version. Otherwise, you do not have MeCab installed or MeCab is not on the search path.
$ mecab -v
# should print `mecab of 0.996` or similar.
To verify that the package is successfully installed, try the following:
$ python
>>> from mecabwrap import tokenize, print_token
>>> for token in tokenize(u"すもももももももものうち"):
... print_token(token)
...
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
See the example notebook (or a cleaner version on nbviewer) for more detail.