-
CPqD
- SĂŁo Paulo, Brasil
-
12:54
(UTC -03:00) - leonardoboulitreau.github.io/
Lists (2)
Sort Name ascending (A-Z)
Stars
Unsupervised Music Source Separation Using Differentiable Parametric Source Models
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Learn how to create, develop, and maintain a state-of-the-art MLOps code base
Reference-aware automatic speech evaluation toolkit
Danny-NUS / SinTechSVS
Forked from yamathcy/ISMIR2022J-POPSupplementary Materials of paper "SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System" by Junchuan Zhao, Low Qi Hong Chetwin, Ye Wang.
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".
AudioFormer:Audio Transformer learns audio feature representations from discrete acoustic codes.SOTA in AudioSet
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
A course on aligning smol models.
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
Audio Codec Speech processing Universal PERformance Benchmark
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
MSP-Podcast Challenge Baseline Code for Interspeech 2025
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Create an AI clone of yourself from your WhatsApp chats (using Llama 3)
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
Automatic headphone equalization from frequency responses
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
FlashSpeech: Efficient Zero-Shot Speech Synthesis
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.