Umlaute are removed #6

luminosuslight · 2021-01-05T07:16:27Z

I think Umlaut-characters (äüößÄÜÖ) are currently just being removed from the input texts instead of getting their own symbol id or being replaced by similar ASCII encodings ('ae', 'ue, 'oe', 'ss'...). Even though I guess the neural network learns to pronounce 'fnf' as 'fünf' I think the performance could be improved by fixing this.

The background is that german_transliterate actually doesn't change the umlaut-characters, even though it states it 'replaces Unicode symbols with ASCII characters'. They are still in the string afterwards and as there is no symbol id for them in symbol_to_id they are just left out in the resulting sequence.

A solution could be to append those characters to ALL_SYMBOLS to give them their own id. Unfortunately the network probably has to be retrained after changing this.

Please don't hesitate to tell me if I got something wrong and umlaut characters are being handled correctly.

[Edit: Thank you Monatis and Thorsten for this really great effort regardless of this issue anyway!]

The text was updated successfully, but these errors were encountered:

monatis · 2021-01-05T08:19:42Z

@luminosuslight Actually you're right, unfortunately :D
I noticed this after training is complete, and that's why the model has difficulty in umlauts sometimes (not always). Anyway, I'm retraining Tacotron2 (and then FastSpeech2 for mobile and embedded inference), and this issue will be fixed in those models. Thanks for the issue.

monatis self-assigned this Jan 5, 2021

monatis added the bug Something isn't working label Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Umlaute are removed #6

Umlaute are removed #6

luminosuslight commented Jan 5, 2021 •

edited

Loading

monatis commented Jan 5, 2021

Umlaute are removed #6

Umlaute are removed #6

Comments

luminosuslight commented Jan 5, 2021 • edited Loading

monatis commented Jan 5, 2021

luminosuslight commented Jan 5, 2021 •

edited

Loading