Kapre: Keras Audio Preprocessors for Real-time GPU Processing
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
Kapre is a powerful Python library that provides Keras layers for real-time audio preprocessing directly on GPUs. It enables efficient computation of STFT, Melspectrograms, and other audio features within your deep learning models. This integration simplifies model deployment, allows for DSP parameter optimization, and ensures consistency compared to traditional pre-computation or custom implementations.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Kapre is a powerful Python library that offers Keras Audio Preprocessors, allowing you to compute essential audio features like STFT, ISTFT, Melspectrogram, and more, directly on the GPU in real-time. Designed for Python 3.8+ with type hints, Kapre integrates seamlessly into your deep learning workflow, making audio feature extraction an integral part of your Keras models.
Installation
Kapre can be easily installed using pip:
pip install kapre
Examples
Integrating Kapre into your Keras model is straightforward. Here's a one-shot example demonstrating how to add STFT and other processing layers to a sequential model:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer
# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
window_name=None, pad_end=False,
input_data_format='channels_last', output_data_format='channels_last',
input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel()) # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer()
# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())
# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification
# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!
For more examples and detailed usage, refer to the example folder in the GitHub repository.
Why Use Kapre?
Kapre offers significant advantages over traditional audio preprocessing methods:
Versus Pre-computation
- You can optimize DSP parameters directly within your model training.
- Model deployment becomes simpler and more consistent, with fewer external dependencies.
- Your code and model have reduced dependencies.
Versus Your Own Implementation
- Quick and Easy: Integrate complex audio processing with minimal effort.
- Consistency: Ensures consistent handling with 1D/2D TensorFlow batch shapes and is data format agnostic (
channels_firstandchannels_last). - Less Error Prone: Kapre layers are rigorously tested against established libraries like Librosa, ensuring accuracy in tricky operations like STFT and decibel conversion.
- Extended APIs: Provides enhanced functionalities beyond default
tf.signalsimplementations, such as a perfectly invertibleSTFTandInverseSTFTpair, and Mel-spectrogram with more options. - Reproducibility: Available on pip with versioning for reliable use.
Links
- GitHub Repository: https://github.com/keunwoochoi/kapre
- API Documentation: https://kapre.readthedocs.io
- Citation: If you use Kapre in your work, please cite the following paper:
@inproceedings{choi2017kapre, title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras}, author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho}, booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning}, year={2017}, organization={ICML} }
Related repositories
Similar repositories that may be relevant next.

LLM Guard: The Security Toolkit for LLM Interactions
June 26, 2026
LLM Guard is an open-source security toolkit developed by Protect AI, designed to fortify the safety of Large Language Models. It offers comprehensive protection against various threats, including prompt injection, data leakage, and harmful language, ensuring secure and reliable LLM interactions.

AuditNLG: Auditing Generative AI for Trustworthiness
June 25, 2026
AuditNLG is an open-source library from Salesforce designed to enhance the trustworthiness of generative AI language models. It provides state-of-the-art techniques to detect and improve factualness, safety, and constraint adherence in AI-generated text. This library simplifies the process of auditing AI outputs, offering explanations and alternative suggestions for problematic content.

Odysseus: A Comprehensive Self-Hosted AI Workspace for Productivity
June 25, 2026
Odysseus is a powerful self-hosted AI workspace designed to integrate various AI-powered tools into a single platform. It offers functionalities for chat, agents, deep research, document management, email, and calendar, supporting both local and API models. This comprehensive solution aims to enhance productivity and streamline AI workflows in a private environment.

Headroom: Drastically Reduce LLM Token Usage for AI Agents
June 25, 2026
Headroom is an innovative context compression layer for AI agents, designed to significantly reduce token usage for LLMs. It achieves 60-95% fewer tokens across various inputs like tool outputs, logs, files, and RAG chunks, all while preserving answer accuracy. This powerful tool enhances efficiency and cost-effectiveness for AI interactions.
Source repository
Open the original repository on GitHub.