Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
Voicebox is an innovative open-source AI voice studio that allows users to clone voices, generate speech in multiple languages, and dictate into any application. It provides a comprehensive, local-first voice I/O stack, offering a powerful alternative to cloud-based solutions. This tool ensures complete privacy and control over your voice data, running entirely on your local machine.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Voicebox is an innovative open-source AI voice studio designed for local-first operation, offering a powerful alternative to cloud-based solutions like ElevenLabs and WisprFlow. This comprehensive application allows users to clone voices from short audio samples, generate speech in 23 languages across 7 different TTS engines, and dictate text into any application using a global hotkey. Voicebox also integrates seamlessly with AI agents, providing a full voice input/output stack that runs entirely on your machine, ensuring complete privacy and control over your data.
Key features include:
- Complete privacy: All models, voice data, and captures remain on your local machine.
- Diverse TTS engines: Access 7 different Text-to-Speech engines, including Qwen3-TTS and LuxTTS.
- Multi-language support: Generate speech in 23 languages, from English to Arabic, Japanese, and Hindi.
- Voice cloning and presets: Create zero-shot voice clones or utilize over 50 curated preset voices.
- Advanced audio effects: Apply pitch shift, reverb, delay, and other post-processing effects.
- Global dictation: Use a hotkey for system-wide voice input, with Whisper-based Speech-to-Text.
- Agent integration: Enable AI agents to speak in cloned voices via a simple API.
Installation
Getting started with Voicebox is straightforward. Pre-built binaries are available for macOS and Windows, while Docker provides a convenient option for containerized deployment. Linux users can build from source.
For macOS (Apple Silicon):
Download DMG
For macOS (Intel):
Download DMG
For Windows:
Download MSI
For Docker:
docker compose up
For detailed instructions, including building from source on Linux, please refer to the official documentation.
Examples
Voicebox provides a robust API for integration into your own applications and scripts. Here are some examples of how to interact with the Voicebox API:
Generate speech:
curl -X POST http://127.0.0.1:17493/generate \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "profile_id": "abc123", "language": "en"}'
Agent voice output:
curl -X POST http://127.0.0.1:17493/speak \
-H "Content-Type: application/json" \
-H "X-Voicebox-Client-Id: my-script" \
-d '{"text": "Deploy complete.", "profile": "Morgan"}'
Transcribe an audio file:
curl -X POST http://127.0.0.1:17493/transcribe \
-F "audio=@recording.wav" \
-F "model=whisper-turbo"
List voice profiles:
curl http://127.0.0.1:17493/profiles
Voicebox also ships with a built-in Model Context Protocol (MCP) server, allowing MCP-aware agents like Claude Code or Cursor to easily integrate voice capabilities.
Why Use Voicebox?
Voicebox stands out as a powerful tool for anyone working with AI voice. Its local-first approach guarantees unparalleled privacy, as all your sensitive voice data and models remain on your machine. The extensive range of features, from multi-engine voice cloning and expressive speech generation to advanced audio effects and unlimited generation length, provides immense flexibility. Furthermore, its seamless integration with AI agents and global dictation capabilities make it an indispensable tool for developers, content creators, and anyone seeking a comprehensive, high-performance voice I/O solution. Built with Tauri (Rust) for native performance and supporting a wide array of GPUs, Voicebox delivers a fast and reliable experience across different platforms.
Links
- GitHub Repository: jamiepine/voicebox
- Official Website: voicebox.sh
- Documentation: docs.voicebox.sh
- Latest Releases: GitHub Releases
Related repositories
Similar repositories that may be relevant next.

Headroom: Drastically Reduce LLM Token Usage for AI Agents
June 25, 2026
Headroom is an innovative context compression layer for AI agents, designed to significantly reduce token usage for LLMs. It achieves 60-95% fewer tokens across various inputs like tool outputs, logs, files, and RAG chunks, all while preserving answer accuracy. This powerful tool enhances efficiency and cost-effectiveness for AI interactions.

Dexter: An Autonomous Agent for Deep Financial Research
June 22, 2026
Dexter is an autonomous financial research agent designed to think, plan, and learn while performing analysis. It leverages task planning, self-reflection, and real-time market data to tackle complex financial questions. This project provides a powerful tool for in-depth financial exploration, emphasizing its educational and informational purposes.
PixelRAG: Pixel-Native Search for Visual Retrieval-Augmented Generation
June 22, 2026
PixelRAG revolutionizes search by enabling pixel-native retrieval, moving beyond traditional text parsing. It renders documents as screenshots, preserving visual context like tables and charts, which is crucial for accurate answers from reader models. This allows for searching any document based on its visual appearance, not just its textual content.

GLM-5: Flagship Models for Long-Horizon Agentic Engineering
June 18, 2026
GLM-5 is a series of flagship models, including GLM-5.2, GLM-5.1, and GLM-5, developed by zai-org for complex systems engineering and long-horizon agentic tasks. These models offer advanced coding capabilities, impressive context lengths, and state-of-the-art performance on various benchmarks. They are designed to sustain effective problem-solving over extended sessions through iterative reasoning and strategy revision.
Source repository
Open the original repository on GitHub.