Repository History
Explore all analyzed open source repositories

Chatterbox: State-of-the-Art Open-Source Text-to-Speech by Resemble AI
Chatterbox is a powerful family of open-source text-to-speech (TTS) models developed by Resemble AI, designed for high-quality speech generation. It features Chatterbox-Turbo, an efficient model with paralinguistic tags for added realism, alongside multilingual and general-purpose TTS options. These models provide robust solutions for voice agents, narration, and creative workflows, incorporating responsible AI features like built-in watermarking.
Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning
Spark-TTS is an advanced text-to-speech system that leverages large language models (LLM) for highly accurate and natural-sounding voice synthesis. Built on Qwen2.5, it offers streamlined efficiency, high-quality zero-shot voice cloning, bilingual support for Chinese and English, and controllable speech generation, making it versatile for both research and production.

index-tts-lora: High-Quality Speech Synthesis with LoRA Fine-tuning
index-tts-lora offers a robust solution for high-quality speech synthesis, leveraging LoRA fine-tuning on the index-tts framework. It significantly enhances prosody and naturalness for both single and multi-speaker voices. This project provides practical methods for training and inference, making advanced voice synthesis more accessible.
audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio
audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

chatterbox-vllm: Accelerating Chatterbox TTS with vLLM for Enhanced Performance
chatterbox-vllm is a high-performance port of the Chatterbox Text-to-Speech (TTS) model to vLLM, designed to significantly improve generation speed and GPU memory efficiency. This personal project aims to provide a more efficient and easily integratable solution for speech synthesis, offering substantial speedups compared to the original implementation. While currently usable and demonstrating benchmark-topping throughput, it leverages internal vLLM APIs and hacky workarounds, with ongoing refactoring planned.