ilotunimi

Toki Pona Tokenizer/Detokenizer

how to use

installation

from github (latest)

pip install git+https://github.com/nymwa/ilonimi

from pypi (easier)

pip install ilonimi

tokenization

nimi tu < text.txt > tokenized.txt

optional arguments:
  -h, --help        show this help message and exit
  --no-tokenize     without tokenization
  --no-normalize    without normalization
  --split           split decimal number by each number and proper noun by each syllable
  --no-sharp        without ## mark for splitted number and proper noun
  --convert-unk     convert unknown word into <proper>
  --convert-number  convert decimal number into <number>
  --convert-proper  convert proper noun into <proper>

detokenization

nimi wan < tokenized.txt > detokenized.txt

optional arguments:
  -h, --help  show this help message and exit
  --merge     merge split numbers and proper nouns
  --no-sharp  merge numbers and proper nouns without ## marks

kanaization

nimi kana < text.txt > kanaized.txt

optional arguments:
  -h, --help       show this help message and exit
  --no-link        without linking (e.g. ろなら -> ろんあら)
  --no-palatalize  without palatalization (e.g. やんそにゃ -> やんそんや)
  --no-comma       delete 「、」
  --space-period   replace 「。」 to 「　」
  --space-colon    replace 「：」 to 「　」

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
ilonimi		ilonimi
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ilotunimi

how to use

installation

tokenization

detokenization

kanaization

About

Releases 5

Packages

Languages

nymwa/ilonimi

Folders and files

Latest commit

History

Repository files navigation

ilotunimi

how to use

installation

tokenization

detokenization

kanaization

About

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages