Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation

Summary

Voicebox is an innovative open-source AI voice studio that allows users to clone voices, generate speech in multiple languages, and dictate into any application. It provides a comprehensive, local-first voice I/O stack, offering a powerful alternative to cloud-based solutions. This tool ensures complete privacy and control over your voice data, running entirely on your local machine.

Repository Information

Analyzed by OSRepos on June 25, 2026

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Voicebox is an innovative open-source AI voice studio designed for local-first operation, offering a powerful alternative to cloud-based solutions like ElevenLabs and WisprFlow. This comprehensive application allows users to clone voices from short audio samples, generate speech in 23 languages across 7 different TTS engines, and dictate text into any application using a global hotkey. Voicebox also integrates seamlessly with AI agents, providing a full voice input/output stack that runs entirely on your machine, ensuring complete privacy and control over your data.

Key features include:

  • Complete privacy: All models, voice data, and captures remain on your local machine.
  • Diverse TTS engines: Access 7 different Text-to-Speech engines, including Qwen3-TTS and LuxTTS.
  • Multi-language support: Generate speech in 23 languages, from English to Arabic, Japanese, and Hindi.
  • Voice cloning and presets: Create zero-shot voice clones or utilize over 50 curated preset voices.
  • Advanced audio effects: Apply pitch shift, reverb, delay, and other post-processing effects.
  • Global dictation: Use a hotkey for system-wide voice input, with Whisper-based Speech-to-Text.
  • Agent integration: Enable AI agents to speak in cloned voices via a simple API.

Installation

Getting started with Voicebox is straightforward. Pre-built binaries are available for macOS and Windows, while Docker provides a convenient option for containerized deployment. Linux users can build from source.

For macOS (Apple Silicon):
Download DMG

For macOS (Intel):
Download DMG

For Windows:
Download MSI

For Docker:

docker compose up

For detailed instructions, including building from source on Linux, please refer to the official documentation.

Examples

Voicebox provides a robust API for integration into your own applications and scripts. Here are some examples of how to interact with the Voicebox API:

Generate speech:

curl -X POST http://127.0.0.1:17493/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "profile_id": "abc123", "language": "en"}'

Agent voice output:

curl -X POST http://127.0.0.1:17493/speak \
  -H "Content-Type: application/json" \
  -H "X-Voicebox-Client-Id: my-script" \
  -d '{"text": "Deploy complete.", "profile": "Morgan"}'

Transcribe an audio file:

curl -X POST http://127.0.0.1:17493/transcribe \
  -F "audio=@recording.wav" \
  -F "model=whisper-turbo"

List voice profiles:

curl http://127.0.0.1:17493/profiles

Voicebox also ships with a built-in Model Context Protocol (MCP) server, allowing MCP-aware agents like Claude Code or Cursor to easily integrate voice capabilities.

Why Use Voicebox?

Voicebox stands out as a powerful tool for anyone working with AI voice. Its local-first approach guarantees unparalleled privacy, as all your sensitive voice data and models remain on your machine. The extensive range of features, from multi-engine voice cloning and expressive speech generation to advanced audio effects and unlimited generation length, provides immense flexibility. Furthermore, its seamless integration with AI agents and global dictation capabilities make it an indispensable tool for developers, content creators, and anyone seeking a comprehensive, high-performance voice I/O solution. Built with Tauri (Rust) for native performance and supporting a wide array of GPUs, Voicebox delivers a fast and reliable experience across different platforms.

Links

Related repositories

Similar repositories that may be relevant next.

Headroom: Drastically Reduce LLM Token Usage for AI Agents

Headroom: Drastically Reduce LLM Token Usage for AI Agents

June 25, 2026

Headroom is an innovative context compression layer for AI agents, designed to significantly reduce token usage for LLMs. It achieves 60-95% fewer tokens across various inputs like tool outputs, logs, files, and RAG chunks, all while preserving answer accuracy. This powerful tool enhances efficiency and cost-effectiveness for AI interactions.

AILLMToken Optimization
Dexter: An Autonomous Agent for Deep Financial Research

Dexter: An Autonomous Agent for Deep Financial Research

June 22, 2026

Dexter is an autonomous financial research agent designed to think, plan, and learn while performing analysis. It leverages task planning, self-reflection, and real-time market data to tackle complex financial questions. This project provides a powerful tool for in-depth financial exploration, emphasizing its educational and informational purposes.

TypeScriptAIFinancial Research
PixelRAG: Pixel-Native Search for Visual Retrieval-Augmented Generation

PixelRAG: Pixel-Native Search for Visual Retrieval-Augmented Generation

June 22, 2026

PixelRAG revolutionizes search by enabling pixel-native retrieval, moving beyond traditional text parsing. It renders documents as screenshots, preserving visual context like tables and charts, which is crucial for accurate answers from reader models. This allows for searching any document based on its visual appearance, not just its textual content.

PythonAIRAG
GLM-5: Flagship Models for Long-Horizon Agentic Engineering

GLM-5: Flagship Models for Long-Horizon Agentic Engineering

June 18, 2026

GLM-5 is a series of flagship models, including GLM-5.2, GLM-5.1, and GLM-5, developed by zai-org for complex systems engineering and long-horizon agentic tasks. These models offer advanced coding capabilities, impressive context lengths, and state-of-the-art performance on various benchmarks. They are designed to sustain effective problem-solving over extended sessions through iterative reasoning and strategy revision.

agentic-aicodingllm

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️