pyAudioAnalysis: A Python Library for Audio Feature Extraction and Analysis

Introduction

pyAudioAnalysis is a comprehensive Python library dedicated to various audio analysis tasks. It empowers users to extract diverse audio features and representations, such as MFCCs, spectrograms, and chromagrams. Beyond feature extraction, the library facilitates the training, parameter tuning, and evaluation of audio segment classifiers, enabling the classification of unknown sounds and the detection of specific audio events. It also supports both supervised and unsupervised segmentation, including speaker diarization, and allows for the training and application of audio regression models, with examples like emotion recognition. Furthermore, pyAudioAnalysis includes tools for dimensionality reduction, aiding in the visualization of audio data and content similarities.

Installation

Getting started with pyAudioAnalysis is straightforward. Follow these steps to install the library:

Clone the repository:

git clone https://github.com/tyiannak/pyAudioAnalysis.git

Install dependencies: Navigate into the cloned directory and install the required packages.
```
cd pyAudioAnalysis
pip install -r ./requirements.txt
```
Install the library:
```
pip install -e .
```

Examples

pyAudioAnalysis offers easy-to-use wrappers for executing audio analysis tasks, alongside command-line support for all its functionalities.

Audio Classification Example

This Python code snippet demonstrates how to train an audio segment classifier using WAV files organized into folders (each representing a different class), and then use the trained classifier to categorize an unknown audio WAV file.

from pyAudioAnalysis import audioTrainTest as aT
aT.extract_features_and_train(["classifierData/music","classifierData/speech"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svmSMtemp", False)
aT.file_classification("data/doremi.wav", "svmSMtemp","svm")

The result of the classification would be similar to: (0.0, array([ 0.90156761, 0.09843239]), ['music', 'speech'])

Command-Line Usage

For command-line enthusiasts, pyAudioAnalysis provides direct access to its features. For instance, to extract the spectrogram of an audio signal stored in a WAV file, you can use:

python audioAnalysis.py fileSpectrogram -i data/doremi.wav

For more detailed examples and comprehensive tutorials, refer to the official pyAudioAnalysis Wiki.

Why Use pyAudioAnalysis

pyAudioAnalysis stands out as a robust choice for audio analysis due to several key advantages:

Comprehensive Functionality: It covers a broad spectrum of audio analysis tasks, from basic feature extraction to advanced machine learning applications like classification, segmentation, and regression, all within a single library.
Ease of Use: The library is designed with user-friendliness in mind, offering simple Python wrappers and extensive command-line support, making it accessible to both beginners and experienced developers.
Machine Learning Integration: It seamlessly integrates various machine learning models, allowing users to train, evaluate, and deploy classifiers and regression models for diverse audio-related problems.
Active Development and Rich Resources: The project is actively maintained, as evidenced by recent news and updates. It also provides a wealth of documentation, including a detailed wiki and several insightful articles, to guide users.
Open-Source and Extensible: Being an open-source project, pyAudioAnalysis is freely available, encouraging community contributions and allowing for customization and extension to suit specific research or application needs.