{"name":"Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation","description":"Voicebox is an innovative open-source AI voice studio that allows users to clone voices, generate speech in multiple languages, and dictate into any application. It provides a comprehensive, local-first voice I/O stack, offering a powerful alternative to cloud-based solutions. This tool ensures complete privacy and control over your voice data, running entirely on your local machine.","github":"https://github.com/jamiepine/voicebox","url":"https://osrepos.com/repo/jamiepine-voicebox","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/jamiepine-voicebox","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/jamiepine-voicebox.md","json":"https://osrepos.com/repo/jamiepine-voicebox.json","topics":["AI","Voice Cloning","Speech Synthesis","Text-to-Speech","Speech-to-Text","Open Source","TypeScript","Desktop App"],"keywords":["AI","Voice Cloning","Speech Synthesis","Text-to-Speech","Speech-to-Text","Open Source","TypeScript","Desktop App"],"stars":null,"summary":"Voicebox is an innovative open-source AI voice studio that allows users to clone voices, generate speech in multiple languages, and dictate into any application. It provides a comprehensive, local-first voice I/O stack, offering a powerful alternative to cloud-based solutions. This tool ensures complete privacy and control over your voice data, running entirely on your local machine.","content":"## Introduction\n\nVoicebox is an innovative open-source AI voice studio designed for local-first operation, offering a powerful alternative to cloud-based solutions like ElevenLabs and WisprFlow. This comprehensive application allows users to clone voices from short audio samples, generate speech in 23 languages across 7 different TTS engines, and dictate text into any application using a global hotkey. Voicebox also integrates seamlessly with AI agents, providing a full voice input/output stack that runs entirely on your machine, ensuring complete privacy and control over your data.\n\nKey features include:\n*   **Complete privacy:** All models, voice data, and captures remain on your local machine.\n*   **Diverse TTS engines:** Access 7 different Text-to-Speech engines, including Qwen3-TTS and LuxTTS.\n*   **Multi-language support:** Generate speech in 23 languages, from English to Arabic, Japanese, and Hindi.\n*   **Voice cloning and presets:** Create zero-shot voice clones or utilize over 50 curated preset voices.\n*   **Advanced audio effects:** Apply pitch shift, reverb, delay, and other post-processing effects.\n*   **Global dictation:** Use a hotkey for system-wide voice input, with Whisper-based Speech-to-Text.\n*   **Agent integration:** Enable AI agents to speak in cloned voices via a simple API.\n\n## Installation\n\nGetting started with Voicebox is straightforward. Pre-built binaries are available for macOS and Windows, while Docker provides a convenient option for containerized deployment. Linux users can build from source.\n\n**For macOS (Apple Silicon):**\n[Download DMG](https://voicebox.sh/download/mac-arm){:target=\"_blank\"}\n\n**For macOS (Intel):**\n[Download DMG](https://voicebox.sh/download/mac-intel){:target=\"_blank\"}\n\n**For Windows:**\n[Download MSI](https://voicebox.sh/download/windows){:target=\"_blank\"}\n\n**For Docker:**\nbash\ndocker compose up\n\n\nFor detailed instructions, including building from source on Linux, please refer to the [official documentation](https://docs.voicebox.sh){:target=\"_blank\"}.\n\n## Examples\n\nVoicebox provides a robust API for integration into your own applications and scripts. Here are some examples of how to interact with the Voicebox API:\n\n**Generate speech:**\nbash\ncurl -X POST http://127.0.0.1:17493/generate \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Hello world\", \"profile_id\": \"abc123\", \"language\": \"en\"}'\n\n\n**Agent voice output:**\nbash\ncurl -X POST http://127.0.0.1:17493/speak \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Voicebox-Client-Id: my-script\" \\\n  -d '{\"text\": \"Deploy complete.\", \"profile\": \"Morgan\"}'\n\n\n**Transcribe an audio file:**\nbash\ncurl -X POST http://127.0.0.1:17493/transcribe \\\n  -F \"audio=@recording.wav\" \\\n  -F \"model=whisper-turbo\"\n\n\n**List voice profiles:**\nbash\ncurl http://127.0.0.1:17493/profiles\n\n\nVoicebox also ships with a built-in Model Context Protocol (MCP) server, allowing MCP-aware agents like Claude Code or Cursor to easily integrate voice capabilities.\n\n## Why Use Voicebox?\n\nVoicebox stands out as a powerful tool for anyone working with AI voice. Its local-first approach guarantees unparalleled privacy, as all your sensitive voice data and models remain on your machine. The extensive range of features, from multi-engine voice cloning and expressive speech generation to advanced audio effects and unlimited generation length, provides immense flexibility. Furthermore, its seamless integration with AI agents and global dictation capabilities make it an indispensable tool for developers, content creators, and anyone seeking a comprehensive, high-performance voice I/O solution. Built with Tauri (Rust) for native performance and supporting a wide array of GPUs, Voicebox delivers a fast and reliable experience across different platforms.\n\n## Links\n\n*   **GitHub Repository:** [jamiepine/voicebox](https://github.com/jamiepine/voicebox){:target=\"_blank\"}\n*   **Official Website:** [voicebox.sh](https://voicebox.sh){:target=\"_blank\"}\n*   **Documentation:** [docs.voicebox.sh](https://docs.voicebox.sh){:target=\"_blank\"}\n*   **Latest Releases:** [GitHub Releases](https://github.com/jamiepine/voicebox/releases/latest){:target=\"_blank\"}","metrics":{"detailViews":2,"githubClicks":1},"dates":{"published":null,"modified":"2026-06-25T07:47:56.000Z"}}