Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation

Summary

Voicebox is an innovative open-source AI voice studio that allows users to clone voices, generate speech in multiple languages, and dictate into any application. It provides a comprehensive, local-first voice I/O stack, offering a powerful alternative to cloud-based solutions. This tool ensures complete privacy and control over your voice data, running entirely on your local machine.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Voicebox is an innovative open-source AI voice studio designed for local-first operation, offering a powerful alternative to cloud-based solutions like ElevenLabs and WisprFlow. This comprehensive application allows users to clone voices from short audio samples, generate speech in 23 languages across 7 different TTS engines, and dictate text into any application using a global hotkey. Voicebox also integrates seamlessly with AI agents, providing a full voice input/output stack that runs entirely on your machine, ensuring complete privacy and control over your data.

Key features include:

Complete privacy: All models, voice data, and captures remain on your local machine.
Diverse TTS engines: Access 7 different Text-to-Speech engines, including Qwen3-TTS and LuxTTS.
Multi-language support: Generate speech in 23 languages, from English to Arabic, Japanese, and Hindi.
Voice cloning and presets: Create zero-shot voice clones or utilize over 50 curated preset voices.
Advanced audio effects: Apply pitch shift, reverb, delay, and other post-processing effects.
Global dictation: Use a hotkey for system-wide voice input, with Whisper-based Speech-to-Text.
Agent integration: Enable AI agents to speak in cloned voices via a simple API.

Installation

Getting started with Voicebox is straightforward. Pre-built binaries are available for macOS and Windows, while Docker provides a convenient option for containerized deployment. Linux users can build from source.

For macOS (Apple Silicon):
Download DMG

For macOS (Intel):
Download DMG

For Windows:
Download MSI

For Docker:

docker compose up

For detailed instructions, including building from source on Linux, please refer to the official documentation.

Examples

Voicebox provides a robust API for integration into your own applications and scripts. Here are some examples of how to interact with the Voicebox API:

Generate speech:

curl -X POST http://127.0.0.1:17493/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "profile_id": "abc123", "language": "en"}'

Agent voice output:

curl -X POST http://127.0.0.1:17493/speak \
  -H "Content-Type: application/json" \
  -H "X-Voicebox-Client-Id: my-script" \
  -d '{"text": "Deploy complete.", "profile": "Morgan"}'

Transcribe an audio file:

curl -X POST http://127.0.0.1:17493/transcribe \
  -F "audio=@recording.wav" \
  -F "model=whisper-turbo"

List voice profiles:

curl http://127.0.0.1:17493/profiles

Voicebox also ships with a built-in Model Context Protocol (MCP) server, allowing MCP-aware agents like Claude Code or Cursor to easily integrate voice capabilities.

Why Use Voicebox?

Voicebox stands out as a powerful tool for anyone working with AI voice. Its local-first approach guarantees unparalleled privacy, as all your sensitive voice data and models remain on your machine. The extensive range of features, from multi-engine voice cloning and expressive speech generation to advanced audio effects and unlimited generation length, provides immense flexibility. Furthermore, its seamless integration with AI agents and global dictation capabilities make it an indispensable tool for developers, content creators, and anyone seeking a comprehensive, high-performance voice I/O solution. Built with Tauri (Rust) for native performance and supporting a wide array of GPUs, Voicebox delivers a fast and reliable experience across different platforms.

Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation

Summary

Repository Information

Topics

Use at your own risk

Introduction

Installation

Examples

Why Use Voicebox?

Links

Related repositories

Headroom: Drastically Reduce LLM Token Usage for AI Agents

Dexter: An Autonomous Agent for Deep Financial Research

PixelRAG: Pixel-Native Search for Visual Retrieval-Augmented Generation

GLM-5: Flagship Models for Long-Horizon Agentic Engineering

Source repository