Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mecab-ipadic does not contain 機能 nor 作用 #14

Open
LanaTimko opened this issue Nov 29, 2021 · 5 comments
Open

mecab-ipadic does not contain 機能 nor 作用 #14

LanaTimko opened this issue Nov 29, 2021 · 5 comments
Labels

Comments

@LanaTimko
Copy link

Hello @nicolas-raoul,

We use your library in our product for Kanji symbols transliteration to Romanji.
In some cases results are not correct:

  • 機能仕様書 is transliterated to: Ji Neng shiyousho
    機能 part got converted to “Ji Neng”, which is Chinese Pinyin and not Romaji

  • 作用 is transliterated to: Zuo Yong (in Romanji it should be: sayō)

Will it be possible to fix the library for better transliteration Kanji symbols to Romaji?

Thanks in advance,
Lana

@nicolas-raoul
Copy link
Owner

Hello Lana,

wow, that's a bug, thanks for letting us know, and please report any other similar problem you can find.

@nicolas-raoul
Copy link
Owner

Interestingly, when I just tried, these words were left as-is:

$ ./jakaroma.sh 機能仕様書
機能Shiyo- Sho
$ ./jakaroma.sh 機能
機能
$ ./jakaroma.sh 作用
作用

which is not great either, but arguably better than outputting mistaken romaji.

Are you using the code found in the GitHub master branch? Or did you modify it somehow? For instance did you switch to another dictionary?

@LanaTimko
Copy link
Author

Hello Nicolas,

yes, we use maven version of your library from master, we didn't change anything.
So we use your standard dictionary.

@LanaTimko
Copy link
Author

LanaTimko commented Jan 10, 2022

Interestingly, when I just tried, these words were left as-is:

$ ./jakaroma.sh 機能仕様書
機能Shiyo- Sho
$ ./jakaroma.sh 機能
機能
$ ./jakaroma.sh 作用
作用

which is not great either, but arguably better than outputting mistaken romaji.

Are you using the code found in the GitHub master branch? Or did you modify it somehow? For instance did you switch to another dictionary?

As I wrote earlier we use your standard dictionary and your the latest master version without any additional change.
But in our logic we use Chinese transliteration by default for Kanji symbols. This behavior was changed by special property for Japanese customers, in this case your library is applied. But if it can't transliterate the symbol (as in examples) our default behavior works (that's why you saw words left as-is and we got Chinese transliteration).
You will help us a lot if you fix that issue and that symbols (機能, 作用) will be transliterated to Romanji in correct way. Now we're going to apply workaround in our product, and will be looking forward for your fix to implement the proper behavior.

Thanks in advance!
Lana

@nicolas-raoul
Copy link
Owner

nicolas-raoul commented Jan 11, 2022

I just downloaded the dictionary http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz (EUC-JP) and found out that Noun.csv contains 仕様 but does not contain 機能 nor 作用 as a single noun. That is probably the problem.

Unfortunately I am currently working on other projects, but could could you please try to find an updated version of that dictionary? Or find the process to add new words to that dictionary. Please post your findings here. Thanks a lot!

@nicolas-raoul nicolas-raoul changed the title Kanji transliteration does not work correctly in some cases mecab-ipadic does not contain 機能 nor 作用 Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants