tortoise.cpp: Local Text-to-Speech with GGML and C++

Summary
tortoise.cpp is a C++ re-implementation of the popular Tortoise-TTS model, leveraging the efficient GGML library. This project enables high-quality, local text-to-speech generation without the need for Python dependencies. It aims to make advanced speech synthesis more accessible and performant on various hardware configurations.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
tortoise.cpp is an impressive project that brings the powerful Tortoise-TTS text-to-speech model to a local, C++ environment using the GGML library. This re-implementation allows users to generate high-quality speech directly on their machines, offering an efficient alternative to Python-based solutions. By utilizing GGML, tortoise.cpp aims for optimized performance across different hardware, including CPU, CUDA-enabled GPUs, and Apple Metal.
Installation
To get started with tortoise.cpp, you'll first need to clone the repository and then compile it for your specific system. Ensure you clone recursively to fetch all necessary submodules.
Downloading
git clone --recursive https://github.com/balisujohn/tortoise.cpp.git
Compiling
For CPU (Linux x86 and Mac ARM):
mkdir build
cd build
cmake ..
make
For CUDA-enabled GPUs (e.g., Ubuntu 22.04 with CUDA 12.0):
mkdir build
cd build
cmake .. -DGGML_CUBLAS=ON
make
For Mac OS with Metal (work in-progress):
mkdir build
cd build
cmake .. -DGGML_METAL=ON
make
Examples
Before running, you must download the necessary ggml-model.bin, ggml-vocoder-model.bin, and ggml-diffusion-model.bin files. These models can be found on Hugging Face.
Download Models: https://huggingface.co/balisujohn/tortoise-ggml
Place these files in the models directory within your tortoise.cpp project.
Basic Run Command (from the build directory):
./tortoise
Example with custom message, voice, and output:
./tortoise --message "based... dr freeman?" --voice "../models/mouse.bin" --seed 0 --output "based?.wav"
Note that only lowercase letters, spaces, and punctuation are supported in the prompt message.
Why Use It
tortoise.cpp offers several compelling reasons for developers and users interested in text-to-speech:
- Local Execution: Perform speech synthesis entirely on your local machine, ensuring privacy and reducing reliance on cloud services.
- Efficiency with GGML: Leverage the optimized performance of the GGML library, making it suitable for real-time or resource-constrained applications.
- C++ Implementation: Benefit from the speed and control offered by C++, without the overhead of Python environments.
- Cross-Platform Support: Compile and run on various operating systems and hardware, including CPUs, NVIDIA GPUs (CUDA), and Apple Silicon (Metal, in progress).
- Open Source: The project is open source under the MIT License, encouraging contributions and community-driven development.
Links
- GitHub Repository: balisujohn/tortoise.cpp
- Hugging Face Models: tortoise-ggml models
- Support Development: Ko-fi for John Balis