Release Version 3.0.0 · TartuNLP/text-to-speech-worker

A new major version compatible with API version 3.0.0 or newer.

Compatible models (multispeaker, vctk and ljspeech) are attached below. Ensure they are downloaded, unzipped and structured as follows:

models
├── hifigan
│   ├── ljspeech
│   │   ├── config.json
│   │   └── model.pt
│   ├── vctk
│   │   ├── config.json
│   │   └── model.pt
└── tts
    └── multispeaker
        ├── config.yaml
        └── model_weights.hdf5

The following commands should be sufficient to achieve this:

wget -P models/tts/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/multispeaker.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/ljspeech.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/vctk.zip
unzip -d models/tts/ models/tts/multispeaker.zip
unzip -d models/hifigan/ models/hifigan/ljspeech.zip
unzip -d models/hifigan/ models/hifigan/vctk.zip

Additionally, the code is still compatible with older single-speaker models.

Changes:

Added multispeaker model support (attached below)
Added a workaround to synthesize longer sentences in multiple parts
More information is sent to the API (predicted durations, normalized text, etc.)
Minor bug fixes

Known issues:

TF_VRAM_LIMIT does not reflect actual VRAM usage but just the amount used by the TTS model (not including the vocoder).
WARNING: git_hash mismatch upon startup - the warning can be ignored.

Disclaimer - the LJSpeech and VCTK HiFiGAN vocoder models below are from this HiFiGAN repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 3.0.0