Repository History
4 repositories tagged with Audio Processing

Kapre: Keras Audio Preprocessors for Real-time GPU Processing
Kapre is a powerful Python library that provides Keras layers for real-time audio preprocessing directly on GPUs. It enables efficient computation of STFT, Melspectrograms, and other audio features within your deep learning models. This integration simplifies model deployment, allows for DSP parameter optimization, and ensures consistency compared to traditional pre-computation or custom implementations.

index-tts-lora: High-Quality Speech Synthesis with LoRA Fine-tuning
index-tts-lora offers a robust solution for high-quality speech synthesis, leveraging LoRA fine-tuning on the index-tts framework. It significantly enhances prosody and naturalness for both single and multi-speaker voices. This project provides practical methods for training and inference, making advanced voice synthesis more accessible.

Diffusion Studio Core: Browser-Based Video Compositing Engine
Diffusion Studio Core is a powerful, browser-based video compositing engine built with TypeScript. It leverages WebCodecs and Canvas2D for hardware-accelerated media processing directly within the browser. Designed for developers building non-linear editors, it supports both interactive playback for editing and high-fidelity rendering for final output across video, audio, and image workloads.

parakeet-mlx: Nvidia's Parakeet ASR Models on Apple Silicon with MLX
parakeet-mlx is an open-source project that implements Nvidia's advanced Automatic Speech Recognition (ASR) Parakeet models for Apple Silicon, leveraging the MLX framework for optimized performance. This Python library offers both a command-line interface and a flexible Python API, enabling efficient transcription of audio files, including real-time streaming capabilities. It provides a powerful solution for developers and researchers working with speech processing on Apple hardware.