-
LG AI Research
Stars
A TTS model capable of generating ultra-realistic dialogue in one pass.
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
Concrete: TFHE Compiler that converts python programs into FHE equivalent
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
The official implementation of GTCRN, an ultra-lightweight SE model.
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis
Evaluation Protocol for Large-Scale Zero-Shot TTS Literature
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
Official repository of NeXt-TDNN for speaker verification
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
The official Implementation of PeriodWave and PeriodWave-Turbo
Official inference repo for FLUX.1 models
Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Simple text to phones converter for multiple languages
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
vits2 backbone with multilingual-bert(한국어 지원)