TTS-Evaluation

This repository gathers our efforts to evaluate/compare the current Multi-speaker TTS systems using objective metrics.

Metrics

UTMOS

We have used the UTMOS model to predict the Naturalness Mean Opinion Score (nMOS). In the HierSpeech++ paper, the authors have used the open-source version of UTMOS, and the presented results of human nMOS and UTMOS are almost aligned. Although this can not be considered an absolute evaluation metric, it can be used to easily compare models in quality terms.

WER/CER

Following previous works, we evaluate pronunciation accuracy using an ASR model. For it, we have used the Whisper Large v3 model for english text and the Paraformer model for chinese text. Additionally, we also removed all text punctuation before computing WER/CER.

Cosine Similarity

To compare the similarity between the synthesized voice and the original speaker, we compute the Speaker Encoder Cosine Similarity (SECS). We have two choices to compute the SECS, which contains ERes2Net-large and WavLM-base-plus-sv.

Mel Cepstral Distortion

We compute the mel cepstral distortion between the predicted wav and the ground-truth wav as follows,

$$ \operatorname{MCD}\left(\mathbf{c}_p, \mathbf{c}g\right)=\frac{10}{\ln 10} \sqrt{2 \sum{k=1}^{M_c}\left[c_p(k)-c_g(k)\right]^2} $$

where $c_p$ and $c_g$ are the predicted and ground-truth Mel cepstrum coefficient vectors respectively, and $M_c$ refers to the dimensionality of Mel cepstrum coefficients.

PESQ

We compute the perceptual evaluation of speech quality score via pypesq.

FO RMSE

We compute the root mean root mean square error in F0 estimation as follows,

$$ \operatorname{RMSE}(\mathbf{f0}, \hat{\mathbf{f0}})=\sqrt{\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{f0}{i}-\hat{\mathbf{f0}}{i}\right)^2} $$

where $\mathbf{f0}$ refers to the ground-truth f0, and $\hat{\mathbf{f0}}$ refers to the predicted f0. $N$ refers to the dimension of the vector.

Installation

pip install -r requirements.txt

Usage

bash run.sh

Todo List

Visqol score.
voice/unvoice errors.

Acknowledgement

We borrow a little of code from Amphion for some evaluation mectrics computation.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
evaluation		evaluation
examples		examples
LICENSE		LICENSE
README.md		README.md
average.py		average.py
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTS-Evaluation

Metrics

UTMOS

WER/CER

Cosine Similarity

Mel Cepstral Distortion

PESQ

FO RMSE

Installation

Usage

Todo List

Acknowledgement

About

Releases

Packages

Languages

License

Shengqiang-Li/TTS-Evaluation

Folders and files

Latest commit

History

Repository files navigation

TTS-Evaluation

Metrics

UTMOS

WER/CER

Cosine Similarity

Mel Cepstral Distortion

PESQ

FO RMSE

Installation

Usage

Todo List

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages