Mandarin Chinese TTS with the input text in Chinese characters #1754

vjdtao · 2020-03-31T18:56:18Z

I've reproduced the results in the CSMSC recipe. I'd like to use Chinese characters in the input text. For example, I randomly chose an utterance (009831) from CSMSC, and extracted 3 different annotations/labels as follows:
From CSMSC/ProsodyLabeling/000001-010000.txt,
至于#1当初#1报考#1南科大#3，他也#1只是想#1逃避#1高考#3，随便#1考着#1玩玩#4。
zhi4 yu2 dang1 chu1 bao4 kao3 nan2 ke1 da4 ta1 ye2 zhi3 shi4 xiang3 tao2 bi4 gao1 kao3 sui2 bian4 kao3 zhe5 wan2 wan5

From CSMSC/PhoneLabeling/009831.interval
zh iii4 v2 d ang1 ch u1 b ao4 k ao3 n an2 k e1 d a4 sp1 t a1 ie2 zh iii3 sh iii4 x iang3 t ao2 b i4 g ao1 k ao3 sp1 s uei2 b ian4 k ao3 zh e5 uan2 uan5

In the recipe, the 3rd annotation extracted from .interval file was adopted as the phone units in the dict file (data/lang_phn/train_no_dev_units.txt), and the text files in the model training and decoding. However, the natural/normal Chinese input text is the 1st annotation above without prosodic marks:
至于当初报考南科大，他也只是想逃避高考，随便考着玩玩。

So I took the idea from the Mandarin demo, and used pypinyin to convert the Chinese characters to Pinyin. The printout after the conversion:
Cleaned text: ['zhi4', 'yu2', 'dang1', 'chu1', 'bao4', 'kao3', 'nan2', 'ke1', 'da4', '，', 'ta1', 'ye3', 'zhi3', 'shi4', 'xiang3', 'tao2', 'bi4', 'gao1', 'kao3', '，', 'sui2', 'bian4', 'kao3', 'zhe', 'wan2', 'wan2', '。']
WARN: ü2 is not included in dict.
WARN: ， is not included in dict.
WARN: ， is not included in dict.
WARN: ui2 is not included in dict.
WARN: e is not included in dict.
WARN: 。 is not included in dict.

The above converted results are similar to the 2nd annotation above, but not close to the input text of the decoding in the recipe. The differences caused warnings, and made the synthesized voice lower quality than the one in the recipe. Has anyone had any experiences and suggestions to fix this issue?

The text was updated successfully, but these errors were encountered:

kan-bayashi · 2020-04-01T01:28:54Z

This is because the difference of text-frontend.
I use phone label provided by the CSMSC dataset, but that is slightly different from pypinyin results.
Actually I'm not familiar with Chinese, what is the difference between CSMSC phone label and pypinyin results.
One possible way is to train the model with the label created by prosody label + pypinyin.

unilight · 2020-04-01T07:44:14Z

Hi.
In https://github.com/espnet/espnet/blob/master/egs/vcc20/vc1_task2/local/clean_text_mandarin.py I have manually defined some rules to align the results from pypinyin with the label provided by the CSMSC dataset.
When using the rules to parse text in VCC2020 dataset, no warning appears.
If you use some other datasets or your own input text, errors might occur.

vjdtao · 2020-04-06T03:05:50Z

@kan-bayashi @unilight Thanks for your reply and helpful suggestions. The CSMSC phone labels look like a "finer version" of the pinyin labels that can be obtained by pypinyin. I guess the solution is to find the mapping between the CSMSC phone and pinyin labels.

kan-bayashi · 2020-04-06T09:06:29Z

@unilight I updated text frontend in the notebook by using your code.
Thank you so much!

r9y9 · 2020-04-08T14:37:35Z

FYI, I just noticed that kakaobrain released a new g2p package for Mandarin Chinese: https://github.com/kakaobrain/g2pM. I'm not familir with Mandarin, but the result looks promising and worth considering to replace pypinyin to it. That would solve some of pronunciation issues.

xyyimian · 2024-01-08T00:13:07Z

FYI, I just noticed that kakaobrain released a new g2p package for Mandarin Chinese: https://github.com/kakaobrain/g2pM. I'm not familir with Mandarin, but the result looks promising and worth considering to replace pypinyin to it. That would solve some of pronunciation issues.

FYI there are some cases g2pM is worse than pypinyin
kakaobrain/g2pm#13

Besides, for some characters like “优菈”, pypinyin could transform but g2pM can't.

kan-bayashi added the Question Question label Apr 1, 2020

vjdtao closed this as completed Apr 6, 2020

kan-bayashi mentioned this issue Apr 7, 2020

Chinese Pinyin format #1781

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mandarin Chinese TTS with the input text in Chinese characters #1754

Mandarin Chinese TTS with the input text in Chinese characters #1754

vjdtao commented Mar 31, 2020

kan-bayashi commented Apr 1, 2020

unilight commented Apr 1, 2020

vjdtao commented Apr 6, 2020

kan-bayashi commented Apr 6, 2020

r9y9 commented Apr 8, 2020

xyyimian commented Jan 8, 2024 •

edited

Loading

Mandarin Chinese TTS with the input text in Chinese characters #1754

Mandarin Chinese TTS with the input text in Chinese characters #1754

Comments

vjdtao commented Mar 31, 2020

kan-bayashi commented Apr 1, 2020

unilight commented Apr 1, 2020

vjdtao commented Apr 6, 2020

kan-bayashi commented Apr 6, 2020

r9y9 commented Apr 8, 2020

xyyimian commented Jan 8, 2024 • edited Loading

xyyimian commented Jan 8, 2024 •

edited

Loading