Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Studio fails to align when a word is "eaten up" by g2p. #27

Open
joanise opened this issue Mar 31, 2020 · 0 comments
Open

Studio fails to align when a word is "eaten up" by g2p. #27

joanise opened this issue Mar 31, 2020 · 0 comments

Comments

@joanise
Copy link
Member

joanise commented Mar 31, 2020

Currently, readalongs align fails when a word is converted to an empty string by the g2p module.

Error message:

ERROR - Alignment produced a different number of segments and tokens, please examine dictionary and input audio and text.

To reproduce this error, checkout 76faf18 in g2p or any commit before the problem with "s" disappearing is fixed in French g2p, go to OpenSamples, and run:

readalongs align -i -s -f -l fra UDHR-Librivox/human_rights_un_frn-preamble.txt UDHR-Librivox/human_rights_un_frn_ezwa_64kb-preamble.mp3 output/UDHR-fra-preamble

The error in this specific example is due to word <w>s</w> (the 330th token in UDHR-fra-preamble.tokenized.xml, on line 37) turning into an empty string because of my g2p rule erasing word-final "s" including in this case where the whole word is "s". As a consequence, file UDHR-fra-preamble.dict skips from token t0b0d0p10s0w42 to t0b0d0p10s0w44, bypassing empty token t0b0d0p10s0w43, causing a mismatch between the number of tokens and dictionary entries.

Eventually, I'll fix the French g2p to not swallow "s", but Studio needs to handle this case gracefully. Options:

  1. Consider it an error and output a meaningful message telling the user to edit the g2p. This is not a great option for general users who might not know how to edit the g2p, though.
  2. Fix the aligner code to align the whole text anyway, cleanly skipping over (or otherwise handling) the word with an empty phonetic representation.
@joanise joanise changed the title Studio fails to align when a word is "eaten" up by g2p. Studio fails to align when a word is "eaten up" by g2p. Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant