High-fidelity speech synthesis for Ukrainian using modern neural networks.
Check out our demo on Hugging Face space or just listen to samples here.
- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.
# Install from PyPI
pip install tts-uk
# OR, for the latest development version:
pip install git+https://github.com/egorsmkv/tts_uk
# OR, use git and local setup
git clone https://github.com/egorsmkv/tts_uk
cd tts_uk
uv sync # uv will handle the virtual environment
Read uv's installation section.
Also, you can download the repository as a ZIP archive.
Code example:
import torchaudio
from tts_uk.inference import synthesis
sampling_rate = 44_100
# Perform the synthesis, `synthesis` function returns:
# - mels: Mel spectrograms of the generated audio.
# - wave: The synthesized waveform by a Vocoder as a PyTorch tensor.
# - stats: A dictionary containing synthesis statistics (processing time, duration, speech rate, etc).
mels, wave, stats = synthesis(
text="Ви можете протестувати синтез мовлення українською мовою. Просто введіть текст, який ви хочете прослухати.",
voice="tetiana", # tetiana, mykyta, lada
n_takes=1,
use_latest_take=False,
token_dur_scaling=1,
f0_mean=0,
f0_std=0,
energy_mean=0,
energy_std=0,
sigma_decoder=0.8,
sigma_token_duration=0.666,
sigma_f0=1,
sigma_energy=1,
)
print(stats)
# Save the generated audio to a WAV file.
torchaudio.save("audio.wav", wave.cpu(), sampling_rate, encoding="PCM_S")
Use these Google colabs:
- CPU inference
- GPU inference on T4 card (long document to synthesize)
Or run synthesis in a terminal:
uv run example.py
If you need to synthesize articles we recommend consider wtpsplit.
Please feel free to connect with us using the Issues section.
Code has the MIT license.
- Discord: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk
Also, follow our Speech-UK initiative on Hugging Face!