A new major version compatible with API version 3.0.0 or newer.
Compatible models (multispeaker, vctk and ljspeech) are attached below. Ensure they are downloaded, unzipped and structured as follows:
models
├── hifigan
│ ├── ljspeech
│ │ ├── config.json
│ │ └── model.pt
│ ├── vctk
│ │ ├── config.json
│ │ └── model.pt
└── tts
└── multispeaker
├── config.yaml
└── model_weights.hdf5
The following commands should be sufficient to achieve this:
wget -P models/tts/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/multispeaker.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/ljspeech.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/vctk.zip
unzip -d models/tts/ models/tts/multispeaker.zip
unzip -d models/hifigan/ models/hifigan/ljspeech.zip
unzip -d models/hifigan/ models/hifigan/vctk.zip
Additionally, the code is still compatible with older single-speaker models.
Changes:
- Added multispeaker model support (attached below)
- Added a workaround to synthesize longer sentences in multiple parts
- More information is sent to the API (predicted durations, normalized text, etc.)
- Minor bug fixes
Known issues:
- TF_VRAM_LIMIT does not reflect actual VRAM usage but just the amount used by the TTS model (not including the vocoder).
WARNING: git_hash mismatch
upon startup - the warning can be ignored.
Disclaimer - the LJSpeech and VCTK HiFiGAN vocoder models below are from this HiFiGAN repository.