Whisper Web: ML-Powered Speech Recognition Directly in Your Browser

Summary

Whisper Web brings powerful, ML-powered speech recognition directly to your browser, leveraging ? Transformers.js. This innovative project allows for client-side audio processing, offering privacy and efficiency without relying on cloud services. It even includes experimental WebGPU support for accelerated performance.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Whisper Web, developed by Xenova, offers cutting-edge, ML-powered speech recognition directly within your web browser. This project leverages the power of ? Transformers.js to perform complex machine learning tasks entirely client-side, eliminating the need for server-side processing. It's designed for privacy and efficiency, allowing users to transcribe audio without sending data to external servers. Furthermore, Whisper Web includes experimental WebGPU support, enabling GPU acceleration for even faster transcription directly in compatible browsers.

Installation

To get Whisper Web running locally, follow these simple steps:

Clone the repository and install dependencies:

git clone https://github.com/xenova/whisper-web.git
cd whisper-web
npm install

Run the development server:
```
npm run dev
```
Note for Firefox users: You may need to change the dom.workers.modules.enabled setting in about:config to true to enable Web Workers. More details can be found in this issue.
Open the link (e.g., http://localhost:5173/) in your browser.

Examples

Experience Whisper Web in action through its live demos:

Main Demo Site: https://huggingface.co/spaces/Xenova/whisper-web
Experimental WebGPU Demo: https://huggingface.co/spaces/Xenova/whisper-webgpu

Why Use It?

Whisper Web stands out for several compelling reasons:

Client-Side Processing: All speech recognition happens directly in the user's browser, enhancing privacy and enabling offline functionality.
Performance: With experimental WebGPU support, it can leverage your device's GPU for significantly faster transcription speeds.
Ease of Integration: Built with TypeScript and JavaScript, it's straightforward to integrate into web applications.
Open Source: Licensed under MIT, it's free to use, modify, and distribute.