Text To Speech

Text To Speech
- Survey
- TTS
- Voco
- Emotion
- VITS
- Efficient
- Projects
- Multilingual
- Evaluation
- Misc

Survey

TTS

Long-Form Speech Generation with Spoken Language Models, arXiv, 2412.18603, arxiv, pdf, cication: -1

Se Jin Park, Julian Salazar, Aren Jansen, ..., Yong Man Ro, RJ Skerry-Ryan · (google.github)
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch, arXiv, 2412.08237, arxiv, pdf, cication: -1

Xingchen Song, Mengtao Xing, Changwei Ma, ..., Zhendong Peng, Zhiyong Wu
Debatts: Zero-Shot Debating Text-to-Speech Synthesis, arXiv, 2411.06540, arxiv, pdf, cication: -1

Yiqiao Huang, Yuancheng Wang, Jiaqi Li, ..., Shunsi Zhang, Zhizheng Wu
Very Attentive Tacotron: Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech, arXiv, 2410.22179, arxiv, pdf, cication: -1

Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, ..., Julian Salazar, David Kao · (sequence-layers - google) · (x)

Voco

Emotion

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector, arXiv, 2411.02625, arxiv, pdf, cication: -1

Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, ..., Seong-Whan Lee

VITS

Efficient

Projects

tts-generation-webui - rsxdalv
alltalk_tts - erew123
OuteTTS-0.1-350M is a novel text-to-speech synthesis model that leverages pure language modeling without external adapters or complex architectures 🤗

· (outeai) · (𝕏)

Multilingual

Evaluation

Misc

对目前TTS领域的个人看法