Skip to content

Version 3.0.0

Latest
Compare
Choose a tag to compare
@liisaratsep liisaratsep released this 04 Jan 17:47

A new major version compatible with API version 3.0.0 or newer.

Compatible models (multispeaker, vctk and ljspeech) are attached below. Ensure they are downloaded, unzipped and structured as follows:

models
├── hifigan
│   ├── ljspeech
│   │   ├── config.json
│   │   └── model.pt
│   ├── vctk
│   │   ├── config.json
│   │   └── model.pt
└── tts
    └── multispeaker
        ├── config.yaml
        └── model_weights.hdf5

The following commands should be sufficient to achieve this:

wget -P models/tts/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/multispeaker.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/ljspeech.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/vctk.zip
unzip -d models/tts/ models/tts/multispeaker.zip
unzip -d models/hifigan/ models/hifigan/ljspeech.zip
unzip -d models/hifigan/ models/hifigan/vctk.zip

Additionally, the code is still compatible with older single-speaker models.

Changes:

  • Added multispeaker model support (attached below)
  • Added a workaround to synthesize longer sentences in multiple parts
  • More information is sent to the API (predicted durations, normalized text, etc.)
  • Minor bug fixes

Known issues:

  • TF_VRAM_LIMIT does not reflect actual VRAM usage but just the amount used by the TTS model (not including the vocoder).
  • WARNING: git_hash mismatch upon startup - the warning can be ignored.

Disclaimer - the LJSpeech and VCTK HiFiGAN vocoder models below are from this HiFiGAN repository.