Skip to content

Latest commit

 

History

History
46 lines (34 loc) · 1.06 KB

README.md

File metadata and controls

46 lines (34 loc) · 1.06 KB

🧑‍🎤 Expressive Text-to-Speech

This is a repository forked from Coqui-AI (🐸TTS ) used to research about expressive TTS in our AI-Unicamp-CPQD group. The original codes are kept in "main" branch which is not our default visualization.

Here we keep the "unicamp' branch as our main branch, while "main" branch remains as the original and updated. You can see here the original README.md.

🔍 About the group

We are an expressive TTS research group located at Unicamp and CPQD (Brazil).

🔨 Implementations

Expressive Models

  • Tacotron 2
  • Fastpitch

Expressive Datasets

  • EMOVDB
  • IEMOCAP
  • ESD

Style Encoders

  • Look-Up
  • Reference Encoder (Coarse/Fine-Grained)
  • GST
  • VAE
  • VQ-VAE
  • VAE+Flow
  • Diffusion

Disentanglement Blocks

  • Style Classifier
  • Speaker Classifier + GRL (Gradient Reversal Layer)

Style Reference Features

  • Pitch
  • Energy
  • Mel-Spectrogram

Agregation Types

  • Sum, Concat or AdaIN

Enhancing Losses

  • Orthogonal Loss
  • CLIP Loss
  • Cycle consistency Loss(*)