Skip to content

iamanigeeit/coqui-tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EPIC TTS Models

This is the code for the paper Empirical Pruning Investigations Characterizing Speech-To-Text Models (EPIC TTS Models) submitted to Interspeech 2022.

It was forked from the 2021-10-30 version of Coqui-TTS and edited. Please refer to the readme there for more details. I made some changes for more flexible pre-processing. Otherwise, the main updates in my paper are here. If you wish to run it, you have to edit the relevant *_tacotron_dca.py files (there are no command-line args for that).

I implemented SNIP myself as there was no correct PyTorch version; it is not elegant as it requires reloading the model after computing each minibatch of gradients, but it gets around the issue of going backward() twice on the same graph.

The sparselearning library was tweaked to make it work for RNNs.

On hindsight, i should have used FastSpeech2 as it solves many of the robustness problems i encountered with Tacotron2. The repeat / skipping in Tacotron2 (even in the baseline) creates disproportionate effects in MOS. Modifying the stopnet loss helped somewhat but could not eliminate the problems.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published