Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

一匹 is translated to "一Hiki" #7

Open
ScoreUnder opened this issue Nov 17, 2017 · 2 comments
Open

一匹 is translated to "一Hiki" #7

ScoreUnder opened this issue Nov 17, 2017 · 2 comments
Labels

Comments

@ScoreUnder
Copy link

As the title says.

I've been looking for a library to break kanji down into their readings (preferably hiragana), and my first test with them is to see how they fare with the 〜匹 counters.

For reference, this is the expected output for the first 10:

漢字 ひらがな Ro-maji
一匹 いっぴき Ippiki
二匹 にひき Nihiki
三匹 さんびき Sanbiki
四匹 よんひき Yonhiki
五匹 ごひき Gohiki
六匹 ろっぴき Roppiki
七匹 ななひき Nanahiki
八匹 はっぴき Happiki
九匹 きゅうひき Kyuuhiki
十匹 じゅっぴき Juppiki

However, this is the output the program creates:

% ./jakaroma.sh '一匹 二匹 三匹 四匹 五匹 六匹 七匹 八匹 九匹 十匹 1匹 2匹 3匹 4匹 5匹 6匹 7匹 8匹 9匹 10匹'
一Hiki  二Hiki  三Hiki  四Hiki  五Hiki  六Hiki  七Hiki  八Hiki  九Hiki  十Hiki  1Hiki  2Hiki  3Hiki  4Hiki  5Hiki  6Hiki  7Hiki  8Hiki  9Hiki  10Hiki 
@nicolas-raoul
Copy link
Owner

nicolas-raoul commented Nov 20, 2017

Thanks for the detailed feedback!
Do you know whether the same problem appears in Kuromoji?

  • If yes, then the problem should be reported to Kuromoji, so that they can fix it.
  • If no, we should figure out what we are doing wrong and fix our code.

On the other side, I can imagine cases where someone would prefer 一丁目 to be translated to 1chome rather than Icchome.

Another problem is that the program outputs kanjis (such as 五Hiki in your example), I am not sure why but that's a big problem indeed.

Cheers!

@nicolas-raoul
Copy link
Owner

Related: atilika/kuromoji#125

Apparently switching to UniDic (might be as simple as modifying pom.xml)would solve that particular case, but it might have lower performance in other areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants