sherpa-onnx: Offline Speech AI for Any Platform and Language

Summary
sherpa-onnx is a powerful open-source library providing comprehensive offline speech processing capabilities, including speech-to-text, text-to-speech, and speaker diarization. Built on next-gen Kaldi with ONNX Runtime, it offers broad support for embedded systems, mobile devices, and desktop platforms. With support for 12 programming languages, it makes advanced AI accessible without an internet connection.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
sherpa-onnx is an advanced open-source library designed for comprehensive offline speech processing. It leverages next-gen Kaldi with ONNX Runtime to deliver a wide array of functionalities, including speech-to-text (ASR), text-to-speech (TTS), speaker diarization, speech enhancement, source separation, and voice activity detection (VAD). A key advantage is its ability to operate entirely without an internet connection, making it ideal for embedded systems, mobile applications, and environments with limited connectivity.
The project boasts exceptional platform compatibility, supporting architectures like x86, x64, ARM (32-bit and 64-bit), and RISC-V, across operating systems such as Linux, macOS, Windows, Android, iOS, and HarmonyOS. Furthermore, it integrates with various NPUs, including Rockchip, Qualcomm, Ascend, and Axera, and provides APIs for 12 programming languages, including C++, C, Python, Java, C#, Go, and JavaScript.
Installation
Getting started with sherpa-onnx typically involves cloning the repository and following the build instructions specific to your target platform and programming language. The project provides detailed guides for different environments, including embedded systems, mobile, and desktop. Pre-trained models, essential for running the various speech tasks offline, are available for download from the official releases.
For comprehensive installation instructions and platform-specific details, please refer to the official documentation of sherpa-onnx.
Examples
sherpa-onnx offers a wide range of functionalities, from real-time speech recognition to speaker diarization and text-to-speech. You can explore its capabilities directly through various Hugging Face Spaces that demonstrate different features in a browser, requiring no local installation.
For mobile users, pre-built Android APKs and Flutter applications are available, showcasing streaming ASR, TTS, and VAD on devices like Raspberry Pi and various NPUs. The project's README also highlights several notable community projects leveraging sherpa-onnx, such as:
- BreezeApp: A mobile AI application for Android and iOS providing offline speech-to-text, text-to-speech, and chatbot interactions.
- Open-LLM-VTuber: Enables hands-free voice interaction with LLMs and Live2D, running locally across platforms.
- TMSpeech: A Windows real-time subtitle software using streaming ASR in C# with a graphical user interface.
- QSmartAssistant: A modular, fully offline, low-resource dialogue robot/smart speaker using QT for ASR and TTS.
Why Use It
The primary appeal of sherpa-onnx lies in its robust offline capabilities, ensuring privacy and reliability without reliance on cloud services. Its extensive support for diverse platforms, from high-performance servers to resource-constrained embedded devices and various mobile operating systems, makes it incredibly versatile. The broad language API coverage further simplifies integration into existing projects, regardless of the development stack. With a comprehensive suite of speech AI features, sherpa-onnx provides a powerful, flexible, and accessible solution for a multitude of speech processing needs.
Links
- GitHub Repository: https://github.com/k2-fsa/sherpa-onnx
- Official Documentation: https://k2-fsa.github.io/sherpa/onnx/