The audio example webpage is still under development
DATASET refers to the names of datasets such as LJSpeech
and VCTK
in the following documents.
You can install the Python dependencies with
pip3 install -r requirements.txt
You can send me a private message to obtain the processed model and put them in
- ``output/ckpt
For a single-speaker TTS, run
python3 synthesize.py --text "YOUR_DESIRED_TEXT" --model MODEL --restore_step RESTORE_STEP --mode single --dataset DATASET
For a multi-speaker TTS, run
python3 synthesize.py --text "YOUR_DESIRED_TEXT" --model MODEL --speaker_id SPEAKER_ID --restore_step RESTORE_STEP --mode single --dataset DATASET
The dictionary of learned speakers can be found at preprocessed_data/DATASET/speakers.json
, and the generated utterances will be put in output/result/
.
Batch inference is also supported, try
python3 synthesize.py --source preprocessed_data/DATASET/val.txt --model MODEL --restore_step RESTORE_STEP --mode batch --dataset DATASET
to synthesize all utterances in preprocessed_data/DATASET/val.txt
.
The pitch/volume/speaking rate of the synthesized utterances can be controlled by specifying the desired pitch/energy/duration ratios. For example, one can increase the speaking rate by 20 % and decrease the volume by 20 % by
python3 synthesize.py --text "YOUR_DESIRED_TEXT" --model MODEL --restore_step RESTORE_STEP --mode single --dataset DATASET --duration_control 0.8 --energy_control 0.8
Please note that the controllability is originated from FastSpeech2 and not a vital interest of Ours.
The supported datasets are
-
For a multi-speaker TTS with external speaker embedder, download ResCNN Softmax+Triplet pretrained model of philipperemy's DeepSpeaker for the speaker embedding and locate it in
./deepspeaker/pretrained_models/
. -
Run
python3 prepare_align.py --dataset DATASET
for some preparations.
For the forced alignment, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. Pre-extracted alignments for the datasets are provided here. You have to unzip the files in
preprocessed_data/DATASET/TextGrid/
. Alternately, you can run the aligner by yourself.After that, run the preprocessing script by
python3 preprocess.py --dataset DATASET
You can train model:
-
Training :
Train with
python3 train.py --model naive --dataset DATASET
Use
tensorboard --logdir output/log/DATASET
to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.