Text-to-Speech for Ukrainian

High-fidelity speech synthesis for Ukrainian using modern neural networks.

Statuses

Demo

Check out our demo on Hugging Face space or just listen to samples here.

Features

Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
High-fidelity speech generation using the RAD-TTS++ acoustic model;
Fast vocoding using Vocos;
Synthesizes long sentences effectively;
Supports a sampling rate of 44.1 kHz;
Tested on Linux environments and Windows/WSL;
Python API (requires Python 3.9 or later);
CUDA-enabled for GPU acceleration.

Installation

# Install from PyPI
pip install tts-uk

# OR, for the latest development version:
pip install git+https://github.com/egorsmkv/tts_uk

# OR, use git and local setup
git clone https://github.com/egorsmkv/tts_uk
cd tts_uk
uv sync # uv will handle the virtual environment

Read uv's installation section.

Also, you can download the repository as a ZIP archive.

Getting started

Code example:

import torchaudio

from tts_uk.inference import synthesis

sampling_rate = 44_100

# Perform the synthesis, `synthesis` function returns:
# - mels: Mel spectrograms of the generated audio.
# - wave: The synthesized waveform by a Vocoder as a PyTorch tensor.
# - stats: A dictionary containing synthesis statistics (processing time, duration, speech rate, etc).
mels, wave, stats = synthesis(
    text="Ви можете протестувати синтез мовлення українською мовою. Просто введіть текст, який ви хочете прослухати.",
    voice="tetiana",  # tetiana, mykyta, lada
    n_takes=1,
    use_latest_take=False,
    token_dur_scaling=1,
    f0_mean=0,
    f0_std=0,
    energy_mean=0,
    energy_std=0,
    sigma_decoder=0.8,
    sigma_token_duration=0.666,
    sigma_f0=1,
    sigma_energy=1,
)

print(stats)

# Save the generated audio to a WAV file.
torchaudio.save("audio.wav", wave.cpu(), sampling_rate, encoding="PCM_S")

Use these Google colabs:

CPU inference
GPU inference on T4 card (long document to synthesize)

Or run synthesis in a terminal:

uv run example.py

If you need to synthesize articles we recommend consider wtpsplit.

Get help and support

Please feel free to connect with us using the Issues section.

License

Code has the MIT license.

Model authors

Acoustic

Yehor Smoliakov, HF profile

Vocoder

Serhiy Stetskovych, HF profile

Community

Discord: https://bit.ly/discord-uds
Speech Recognition: https://t.me/speech_recognition_uk
Speech Synthesis: https://t.me/speech_synthesis_uk

Also, follow our Speech-UK initiative on Hugging Face!

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github		.github
assets		assets
tts_uk		tts_uk
vocos		vocos
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
RADTTS-LICENSE		RADTTS-LICENSE
README.md		README.md
TEXT-PROCESSING-LICENSE		TEXT-PROCESSING-LICENSE
VOCOS-LICENSE		VOCOS-LICENSE
example.py		example.py
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Text-to-Speech for Ukrainian

Statuses

Demo

Features

Installation

Getting started

Get help and support

License

Model authors

Acoustic

Vocoder

Community

Acknowledgements

About

Licenses found

Releases 6

Sponsor this project

Languages

License

Licenses found

egorsmkv/tts_uk

Folders and files

Latest commit

History

Repository files navigation

Text-to-Speech for Ukrainian

Statuses

Demo

Features

Installation

Getting started

Get help and support

License

Model authors

Acoustic

Vocoder

Community

Acknowledgements

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 6

Sponsor this project

Languages