Integration of the Open-Source Sesame CSM-1B Conversational Speech #12657

rodrigoGA · 2025-03-18T12:37:29Z

It would be necessary to add support for the open-source Sesame CSM-1B model within NVIDIA NeMo’s TTS framework. The CSM-1B model uses a Llama backbone paired with an audio decoder to generate RVQ audio codes from text and audio inputs, making it highly effective for creating natural, conversational speech.

rodrigoGA assigned okuchaiev Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration of the Open-Source Sesame CSM-1B Conversational Speech #12657

Integration of the Open-Source Sesame CSM-1B Conversational Speech #12657

rodrigoGA commented Mar 18, 2025

Integration of the Open-Source Sesame CSM-1B Conversational Speech #12657

Integration of the Open-Source Sesame CSM-1B Conversational Speech #12657

Comments

rodrigoGA commented Mar 18, 2025