whisper.cpp: High-Performance Speech Recognition with OpenAI's Whisper Model

Summary
whisper.cpp is a high-performance C/C++ port of OpenAI's Whisper automatic speech recognition (ASR) model. It offers efficient, dependency-free inference across a wide range of platforms, from desktop to mobile and embedded devices. This project enables fast, local speech-to-text capabilities, making advanced AI accessible for various applications.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
whisper.cpp is a remarkable high-performance C/C++ port of OpenAI's cutting-edge Whisper automatic speech recognition (ASR) model. Designed for efficiency and portability, this project allows for fast, local inference of the Whisper model without external dependencies, making advanced speech-to-text capabilities accessible on a wide array of devices.
Key features include optimized performance for Apple Silicon (via ARM NEON, Accelerate, Metal, Core ML), AVX intrinsics for x86, VSX for POWER architectures, and robust GPU support for NVIDIA (cuBLAS), Vulkan, OpenVINO, Ascend NPU, and Moore Threads GPUs. It also supports mixed F16/F32 precision, integer quantization, and boasts zero memory allocations at runtime. whisper.cpp runs seamlessly across Mac OS, iOS, Android, Linux, Windows, WebAssembly, Raspberry Pi, and even within Docker containers, demonstrating its exceptional cross-platform compatibility.
Installation
Getting started with whisper.cpp is straightforward. Follow these steps for a quick setup:
Clone the repository:
git clone https://github.com/ggml-org/whisper.cpp.gitNavigate into the directory:
cd whisper.cppDownload a Whisper model (e.g.,
base.en):sh ./models/download-ggml-model.sh base.enBuild the project and transcribe an audio file:
cmake -B build cmake --build build -j --config Release ./build/bin/whisper-cli -f samples/jfk.wavFor a quick demo, you can also simply run
make base.ento download the model and transcribe sample audio files.
Examples
whisper.cpp offers a variety of examples showcasing its versatility:
Real-time Audio Input: The
streamexample enables continuous transcription from your microphone, ideal for live applications. Requires SDL2.cmake -B build -DWHISPER_SDL2=ON cmake --build build -j --config Release ./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000Karaoke-style Movie Generation: Generate videos where the currently spoken word is highlighted, perfect for educational content or fun. This requires
ffmpeg../build/bin/whisper-cli -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts source ./samples/jfk.wav.wts ffplay ./samples/jfk.wav.mp4Voice Activity Detection (VAD): Integrate VAD models like Silero-VAD to process only speech segments, significantly speeding up transcription.
./models/download-vad-model.sh silero-v5.1.2 ./build/bin/whisper-cli -vm ./models/ggml-silero-v5.1.2.bin --vad -f samples/jfk.wav -m models/ggml-base.en.binMobile Applications: Examples for iOS and Android demonstrate on-device, offline transcription.
WebAssembly: Run Whisper directly in your browser with whisper.wasm.
Why Use It
whisper.cpp stands out for several compelling reasons:
Unparalleled Performance: Achieves high-speed inference, often faster-than-realtime, through extensive optimizations for various CPU architectures and dedicated GPU support (NVIDIA, Vulkan, OpenVINO, Apple Neural Engine, Ascend NPU, Moore Threads).
Exceptional Portability: Written in plain C/C++ with no external dependencies, it's incredibly easy to integrate into diverse projects and deploy across virtually any platform, from embedded systems to powerful servers.
Resource Efficiency: Features like mixed precision, integer quantization, and zero runtime memory allocations ensure minimal resource consumption, making it suitable for constrained environments.
Active Development & Community: Benefits from continuous improvements, a clear roadmap, and a vibrant community contributing to its development and offering numerous language bindings (Rust, JavaScript, Go, Java, Ruby, .NET, Python, R, Unity).
Links
- GitHub Repository: https://github.com/ggml-org/whisper.cpp
- Hugging Face Models: https://huggingface.co/ggerganov/whisper.cpp
- Discussions & FAQ: https://github.com/ggml-org/whisper.cpp/discussions
- Project Roadmap: https://github.com/orgs/ggml-org/projects/4/