Repository History

Explore all analyzed open source repositories

Topic: Speech Synthesis
Chatterbox: State-of-the-Art Open-Source Text-to-Speech by Resemble AI

Chatterbox: State-of-the-Art Open-Source Text-to-Speech by Resemble AI

Chatterbox is a powerful family of open-source text-to-speech (TTS) models developed by Resemble AI, designed for high-quality speech generation. It features Chatterbox-Turbo, an efficient model with paralinguistic tags for added realism, alongside multilingual and general-purpose TTS options. These models provide robust solutions for voice agents, narration, and creative workflows, incorporating responsible AI features like built-in watermarking.

Apr 19, 2026
View Details
Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning

Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning

Spark-TTS is an advanced text-to-speech system that leverages large language models (LLM) for highly accurate and natural-sounding voice synthesis. Built on Qwen2.5, it offers streamlined efficiency, high-quality zero-shot voice cloning, bilingual support for Chinese and English, and controllable speech generation, making it versatile for both research and production.

Apr 5, 2026
View Details
index-tts-lora: High-Quality Speech Synthesis with LoRA Fine-tuning

index-tts-lora: High-Quality Speech Synthesis with LoRA Fine-tuning

index-tts-lora offers a robust solution for high-quality speech synthesis, leveraging LoRA fine-tuning on the index-tts framework. It significantly enhances prosody and naturalness for both single and multi-speaker voices. This project provides practical methods for training and inference, making advanced voice synthesis more accessible.

Mar 23, 2026
View Details
audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

Nov 20, 2025
View Details
chatterbox-vllm: Accelerating Chatterbox TTS with vLLM for Enhanced Performance

chatterbox-vllm: Accelerating Chatterbox TTS with vLLM for Enhanced Performance

chatterbox-vllm is a high-performance port of the Chatterbox Text-to-Speech (TTS) model to vLLM, designed to significantly improve generation speed and GPU memory efficiency. This personal project aims to provide a more efficient and easily integratable solution for speech synthesis, offering substantial speedups compared to the original implementation. While currently usable and demonstrating benchmark-topping throughput, it leverages internal vLLM APIs and hacky workarounds, with ongoing refactoring planned.

Oct 11, 2025
View Details
Page 1