parakeet-mlx: Nvidia's Parakeet ASR Models on Apple Silicon with MLX

Summary
parakeet-mlx is an open-source project that implements Nvidia's advanced Automatic Speech Recognition (ASR) Parakeet models for Apple Silicon, leveraging the MLX framework for optimized performance. This Python library offers both a command-line interface and a flexible Python API, enabling efficient transcription of audio files, including real-time streaming capabilities. It provides a powerful solution for developers and researchers working with speech processing on Apple hardware.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
parakeet-mlx is an implementation of Nvidia's Parakeet models, which are Automatic Speech Recognition (ASR) models, optimized for Apple Silicon using the MLX framework. This open-source project allows users to efficiently transcribe audio files, leveraging Apple hardware for superior performance.
With parakeet-mlx, you can easily convert speech to text using a straightforward command-line interface (CLI) or integrate advanced ASR capabilities into your Python applications. It supports various output options, including subtitles with word-level timestamps, and offers features like beam decoding, audio chunking for long files, and real-time streaming transcription.
Installation
Before installing, make sure you have ffmpeg installed on your system, as it is required for the CLI to work properly.
Using uv (recommended):
To add as a project dependency:
uv add parakeet-mlx -U
Or, for the CLI globally:
uv tool install parakeet-mlx -U
Using pip:
pip install parakeet-mlx -U
Examples
CLI Quick Start
Transcribe a single audio file:
parakeet-mlx audio.mp3
Transcribe multiple files and generate VTT subtitles with word-level timestamps:
parakeet-mlx *.mp3 --output-format vtt --highlight-words
Generate all available output formats:
parakeet-mlx audio.mp3 --output-format all
Python API Quick Start
Transcribe a file:
from parakeet_mlx import from_pretrained
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")
result = model.transcribe("audio_file.wav")
print(result.text)
Check timestamps:
from parakeet_mlx import from_pretrained
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")
result = model.transcribe("audio_file.wav")
print(result.sentences)
# [AlignedSentence(text="Hello World.", start=1.01, end=2.04, duration=1.03, tokens=[...])]
Do chunking:
from parakeet_mlx import from_pretrained
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")
result = model.transcribe("audio_file.wav", chunk_duration=60 * 2.0, overlap_duration=15.0)
print(result.sentences)
Streaming Transcription:
For real-time transcription, use the transcribe_stream method:
from parakeet_mlx import from_pretrained
from parakeet_mlx.audio import load_audio
import numpy as np
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")
# Create a streaming context
with model.transcribe_stream(
context_size=(256, 256), # (left_context, right_context) frames
) as transcriber:
# Simulate real-time audio chunks
audio_data = load_audio("audio_file.wav", model.preprocessor_config.sample_rate)
chunk_size = model.preprocessor_config.sample_rate # 1 second chunks
for i in range(0, len(audio_data), chunk_size):
chunk = audio_data[i:i+chunk_size]
transcriber.add_audio(chunk)
# Access current transcription
result = transcriber.result
print(f"Current text: {result.text}")
Why Use parakeet-mlx?
parakeet-mlx stands out as an essential tool for anyone needing high-performance ASR capabilities on Apple Silicon devices.
- Optimized for Apple Silicon: By leveraging the MLX framework,
parakeet-mlxdelivers native and efficient performance, making it ideal for Mac users. - High-Quality ASR: It implements Nvidia's Parakeet models, known for their accuracy and robustness in speech recognition.
- Versatility: Whether you prefer a command-line tool for quick tasks or a flexible Python API for integration into larger projects,
parakeet-mlxhas you covered. - Advanced Features: From detailed word and sentence-level timestamps to advanced decoding options and real-time streaming transcription, the project offers a rich set of functionalities for diverse needs.
- Ease of Use: With clear installation instructions and comprehensive examples, it is accessible to both beginners and experienced developers.
Links
For more details, documentation, and to contribute to the project, visit the official GitHub repository: