sherpa-onnx: Offline Speech AI for Any Platform and Language

Introduction

sherpa-onnx is an advanced open-source library designed for comprehensive offline speech processing. It leverages next-gen Kaldi with ONNX Runtime to deliver a wide array of functionalities, including speech-to-text (ASR), text-to-speech (TTS), speaker diarization, speech enhancement, source separation, and voice activity detection (VAD). A key advantage is its ability to operate entirely without an internet connection, making it ideal for embedded systems, mobile applications, and environments with limited connectivity.

The project boasts exceptional platform compatibility, supporting architectures like x86, x64, ARM (32-bit and 64-bit), and RISC-V, across operating systems such as Linux, macOS, Windows, Android, iOS, and HarmonyOS. Furthermore, it integrates with various NPUs, including Rockchip, Qualcomm, Ascend, and Axera, and provides APIs for 12 programming languages, including C++, C, Python, Java, C#, Go, and JavaScript.

Installation

Getting started with sherpa-onnx typically involves cloning the repository and following the build instructions specific to your target platform and programming language. The project provides detailed guides for different environments, including embedded systems, mobile, and desktop. Pre-trained models, essential for running the various speech tasks offline, are available for download from the official releases.

For comprehensive installation instructions and platform-specific details, please refer to the official documentation of sherpa-onnx.

Examples

sherpa-onnx offers a wide range of functionalities, from real-time speech recognition to speaker diarization and text-to-speech. You can explore its capabilities directly through various Hugging Face Spaces that demonstrate different features in a browser, requiring no local installation.

For mobile users, pre-built Android APKs and Flutter applications are available, showcasing streaming ASR, TTS, and VAD on devices like Raspberry Pi and various NPUs. The project's README also highlights several notable community projects leveraging sherpa-onnx, such as:

BreezeApp: A mobile AI application for Android and iOS providing offline speech-to-text, text-to-speech, and chatbot interactions.
Open-LLM-VTuber: Enables hands-free voice interaction with LLMs and Live2D, running locally across platforms.
TMSpeech: A Windows real-time subtitle software using streaming ASR in C# with a graphical user interface.
QSmartAssistant: A modular, fully offline, low-resource dialogue robot/smart speaker using QT for ASR and TTS.

Why Use It

The primary appeal of sherpa-onnx lies in its robust offline capabilities, ensuring privacy and reliability without reliance on cloud services. Its extensive support for diverse platforms, from high-performance servers to resource-constrained embedded devices and various mobile operating systems, makes it incredibly versatile. The broad language API coverage further simplifies integration into existing projects, regardless of the development stack. With a comprehensive suite of speech AI features, sherpa-onnx provides a powerful, flexible, and accessible solution for a multitude of speech processing needs.

sherpa-onnx: Offline Speech AI for Any Platform and Language

Summary

Repository Information

Topics

Use at your own risk

Introduction

Installation

Examples

Why Use It

Links

Related repositories

parakeet-mlx: Nvidia's Parakeet ASR Models on Apple Silicon with MLX

ClickUi: Your Cross-Platform AI Assistant for Local and Cloud Models

Source repository