Explore all analyzed open source repositories
CSM (Conversational Speech Model) is an advanced speech generation model from SesameAILabs, designed to create RVQ audio codes from text and audio inputs. It leverages a Llama backbone and a smaller audio decoder for Mimi audio codes, enabling high-quality, context-aware speech synthesis. The model is now natively available in Hugging Face Transformers, making it accessible for researchers and developers.