🧑‍🎤 Expressive Text-to-Speech

This is a repository forked from Coqui-AI (🐸TTS ) used to research about expressive TTS in our AI-Unicamp-CPQD group. The original codes are kept in "main" branch which is not our default visualization.

Here we keep the "unicamp' branch as our main branch, while "main" branch remains as the original and updated. You can see here the original README.md.

🔍 About the group

We are an expressive TTS research group located at Unicamp and CPQD (Brazil).

🔨 Implementations

Expressive Models

Tacotron 2
Fastpitch

Expressive Datasets

EMOVDB
IEMOCAP
ESD

Style Encoders

Look-Up
Reference Encoder (Coarse/Fine-Grained)
GST
VAE
VQ-VAE
VAE+Flow
Diffusion

Disentanglement Blocks

Style Classifier
Speaker Classifier + GRL (Gradient Reversal Layer)

Style Reference Features

Pitch
Energy
Mel-Spectrogram

Agregation Types

Sum, Concat or AdaIN

Enhancing Losses

Orthogonal Loss
CLIP Loss
Cycle consistency Loss(*)