experiments with fastspeech2 #10

cassiotbatista · 2023-03-01T12:50:21Z

There were complaints regarding the size of the m/f dataset, which was not large enough to draw conclusions. The idea then is to train a phoneme-based TTS such as FastSpeech 2 using two forced aligners and take a look at some similarity metric (e.g., PESQ) between the synthetic and the original voices. Training and test data would be FalaBrasil's Constituição dataset.

cassiotbatista · 2023-03-31T01:29:26Z

Both Coqui and ESPnet have been a pain so far, the former more than the latter.

Coqui can generate alignments externally with a Tacotron model, as in FastSpeech's v1, but the default behaviour is to train an alignment head end to end (ref?). Besides, it seems to have moved on with its char utils but not with the script that computes att masks, which I think is importing outdated stuff. The plan was to take a look at what kind of alignments they are producing with Tacotron, so I could reproduce stuff in the same format with MFA later, but RN I couldn't get any of it work.

ESPnet supports MFA but I'm having trouble with MFA server's Postgree's connection (???). RN it seems my best option because it looks like the problem is more from MFA than ESPnet, which should (hopefully) be easier to solve.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments with fastspeech2 #10

experiments with fastspeech2 #10

cassiotbatista commented Mar 1, 2023

cassiotbatista commented Mar 31, 2023

experiments with fastspeech2 #10

experiments with fastspeech2 #10

Comments

cassiotbatista commented Mar 1, 2023

cassiotbatista commented Mar 31, 2023