ZhTTS

A demo of zh/Chinese Text to Speech system run on CPU in real time. (fastspeech2 + mbmelgan)

RTF(real time factor): 0.2 with cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz 24khz audio use fastspeech2, RTF1.6 for tacotron2

This repo is mainly based on TensorFlowTTS with little improvement.

tflite model come from colab, thx to @azraelkuan
add pause at punctuation
add TN (Text Normalization) from chinese_text_normalization

demo wav

text = "2020年，这是一个开源的端到端中文语音合成系统"

zhtts synthesis mp3

Install

pip install zhtts

or clone this repo, then pip install .

Usage

import zhtts

text = "2020年，这是一个开源的端到端中文语音合成系统"
tts = zhtts.TTS() # use fastspeech2 by default

tts.text2wav(text, "demo.wav")
>>> Save wav to demo.wav

tts.frontend(text)
>>> ('二零二零年，这是一个开源的端到端中文语音合成系统', 'sil ^ er4 #0 l ing2 #0 ^ er4 #0 l ing2 #0 n ian2 #0 #3 zh e4 #0 sh iii4 #0 ^ i2 #0 g e4 #0 k ai1 #0 ^ van2 #0 d e5 #0 d uan1 #0 d ao4 #0 d uan1 #0 zh ong1 #0 ^ uen2 #0 ^ v3 #0 ^ in1 #0 h e2 #0 ch eng2 #0 x i4 #0 t ong3 sil')

tts.synthesis(text)
>>> array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

web api demo

clone this repo, pip install flask first, then

python app.py

visit http://localhost:5000 for tts interaction
do HTTP GET at http://localhost:5000/api/tts?text=your%20sentence to get WAV audio back:

$ curl -o "helloworld.wav" "http://localhost:5000/api/tts?text=%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C"

%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C is url code of"你好，世界！"

Use tacotron2 instead of fastspeech2

wav generate from tacotron model is better than fast speech, however tacotron is much slower , to use Tacotron, change code

import zhtts
tts = zhtts.TTS(text2mel_name="TACOTRON")
# tts = zhtts.TTS(text2mel_name="FASTSPEECH2")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ZhTTS

demo wav

Install

Usage

web api demo

Use tacotron2 instead of fastspeech2

Files

README.md

Latest commit

History

README.md

File metadata and controls

ZhTTS

demo wav

Install

Usage

web api demo

Use tacotron2 instead of fastspeech2