Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umlaute are removed #6

Open
luminosuslight opened this issue Jan 5, 2021 · 1 comment
Open

Umlaute are removed #6

luminosuslight opened this issue Jan 5, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@luminosuslight
Copy link

luminosuslight commented Jan 5, 2021

I think Umlaut-characters (äüößÄÜÖ) are currently just being removed from the input texts instead of getting their own symbol id or being replaced by similar ASCII encodings ('ae', 'ue, 'oe', 'ss'...). Even though I guess the neural network learns to pronounce 'fnf' as 'fünf' I think the performance could be improved by fixing this.

The background is that german_transliterate actually doesn't change the umlaut-characters, even though it states it 'replaces Unicode symbols with ASCII characters'. They are still in the string afterwards and as there is no symbol id for them in symbol_to_id they are just left out in the resulting sequence.

A solution could be to append those characters to ALL_SYMBOLS to give them their own id. Unfortunately the network probably has to be retrained after changing this.

Please don't hesitate to tell me if I got something wrong and umlaut characters are being handled correctly.

[Edit: Thank you Monatis and Thorsten for this really great effort regardless of this issue anyway!]

@monatis
Copy link
Owner

monatis commented Jan 5, 2021

@luminosuslight Actually you're right, unfortunately :D
I noticed this after training is complete, and that's why the model has difficulty in umlauts sometimes (not always). Anyway, I'm retraining Tacotron2 (and then FastSpeech2 for mobile and embedded inference), and this issue will be fixed in those models. Thanks for the issue.

@monatis monatis self-assigned this Jan 5, 2021
@monatis monatis added the bug Something isn't working label Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants