parakeet-mlx: Nvidia's Parakeet ASR Models on Apple Silicon with MLX

Introduction

parakeet-mlx is an implementation of Nvidia's Parakeet models, which are Automatic Speech Recognition (ASR) models, optimized for Apple Silicon using the MLX framework. This open-source project allows users to efficiently transcribe audio files, leveraging Apple hardware for superior performance.

With parakeet-mlx, you can easily convert speech to text using a straightforward command-line interface (CLI) or integrate advanced ASR capabilities into your Python applications. It supports various output options, including subtitles with word-level timestamps, and offers features like beam decoding, audio chunking for long files, and real-time streaming transcription.

Installation

Before installing, make sure you have ffmpeg installed on your system, as it is required for the CLI to work properly.

Using uv (recommended):

To add as a project dependency:

uv add parakeet-mlx -U

Or, for the CLI globally:

uv tool install parakeet-mlx -U

Using pip:

pip install parakeet-mlx -U

Examples

CLI Quick Start

Transcribe a single audio file:

parakeet-mlx audio.mp3

Transcribe multiple files and generate VTT subtitles with word-level timestamps:

parakeet-mlx *.mp3 --output-format vtt --highlight-words

Generate all available output formats:

parakeet-mlx audio.mp3 --output-format all

Python API Quick Start

Transcribe a file:

from parakeet_mlx import from_pretrained

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

result = model.transcribe("audio_file.wav")

print(result.text)

Check timestamps:

from parakeet_mlx import from_pretrained

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

result = model.transcribe("audio_file.wav")

print(result.sentences)
# [AlignedSentence(text="Hello World.", start=1.01, end=2.04, duration=1.03, tokens=[...])]

Do chunking:

from parakeet_mlx import from_pretrained

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

result = model.transcribe("audio_file.wav", chunk_duration=60 * 2.0, overlap_duration=15.0)

print(result.sentences)

Streaming Transcription:

For real-time transcription, use the transcribe_stream method:

from parakeet_mlx import from_pretrained
from parakeet_mlx.audio import load_audio
import numpy as np

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

# Create a streaming context
with model.transcribe_stream(
    context_size=(256, 256),  # (left_context, right_context) frames
) as transcriber:
    # Simulate real-time audio chunks
    audio_data = load_audio("audio_file.wav", model.preprocessor_config.sample_rate)
    chunk_size = model.preprocessor_config.sample_rate  # 1 second chunks

    for i in range(0, len(audio_data), chunk_size):
        chunk = audio_data[i:i+chunk_size]
        transcriber.add_audio(chunk)

        # Access current transcription
        result = transcriber.result
        print(f"Current text: {result.text}")

Why Use `parakeet-mlx`?

parakeet-mlx stands out as an essential tool for anyone needing high-performance ASR capabilities on Apple Silicon devices.

Optimized for Apple Silicon: By leveraging the MLX framework, parakeet-mlx delivers native and efficient performance, making it ideal for Mac users.
High-Quality ASR: It implements Nvidia's Parakeet models, known for their accuracy and robustness in speech recognition.
Versatility: Whether you prefer a command-line tool for quick tasks or a flexible Python API for integration into larger projects, parakeet-mlx has you covered.
Advanced Features: From detailed word and sentence-level timestamps to advanced decoding options and real-time streaming transcription, the project offers a rich set of functionalities for diverse needs.
Ease of Use: With clear installation instructions and comprehensive examples, it is accessible to both beginners and experienced developers.

parakeet-mlx: Nvidia's Parakeet ASR Models on Apple Silicon with MLX

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use `parakeet-mlx`?

Links

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use parakeet-mlx?

Links

Why Use `parakeet-mlx`?