Magenta RT: Live Music Generation on Your Local Device
Summary
Magenta RealTime (Magenta RT) is an open-source Python library for live music audio generation on local devices. It allows users to create music using both text and audio prompts, serving as a powerful tool for real-time creative audio exploration. This library is the on-device companion to Google's MusicFX DJ Mode and the Lyria RealTime API.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Magenta RealTime (Magenta RT) is an open-source Python library designed for live music audio generation directly on your local device. It serves as the on-device companion to Google's MusicFX DJ Mode and the Lyria RealTime API, enabling users to create music through both text and audio prompts. Magenta RT generates audio in short chunks (2 seconds) given a finite amount of past context (10 seconds), utilizing crossfading to mitigate boundary artifacts between chunks.
Installation
Getting started with Magenta RT is straightforward, offering options for cloud-based demos or local setup.
The fastest way to experience Magenta RT is through its official Colab Demo, which runs in real-time on freely available TPUs. Additionally, there are Colab demos supporting live audio input and customization via finetuning.
For local execution, you can choose between using Docker or performing a native installation.
Running Locally via Docker
This method requires a powerful GPU with 40GB memory, Linux, and Docker.
mkdir -p ~/.cache/magenta_rt
docker run -it \
--gpus device=0 \
-v ~/.cache/magenta_rt:/magenta-realtime/cache \
-p 8000:8000 \
us-docker.pkg.dev/brain-magenta/magenta-rt/magenta-rt:gpu
After running the command, open the web demo at https://localhost:8000.
Local Installation
If you prefer to run Magenta RT natively rather than using Docker, follow these instructions.
Step 1: Install Python 3.12
sudo apt update
sudo apt install software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.12 python3.12-venv python3.12-dev -y
Step 2: Install Magenta RT for GPU
# Clone Magenta RT
git clone https://github.com/magenta/magenta-realtime.git
cd magenta-realtime
# Create a virtual environment
python3.12 -m venv .venv
source .venv/bin/activate
# Patch and install t5x
git clone https://github.com/google-research/t5x.git && \
pushd t5x && \
git checkout 7781d16 && \
patch setup.py < ../patch/t5x_setup.py.patch && \
patch t5x/partitioning.py < ../patch/t5x_partitioning.py.patch && \
pip install .[gpu] && \
popd
# Install Magenta RT
pip install -e .[gpu] && pip install tf2jax==0.3.8
# Patch seqIO to remove tensorflow-text dependency
patch .venv/lib/python3.12/site-packages/seqio/vocabularies.py < patch/seqio_vocabularies.py.patch
Step 2 (alternative): Install Magenta RT for TPU
# Create a virtual environment
python3.12 -m venv .venv
source .venv/bin/activate
# Install Magenta RT
git clone https://github.com/magenta/magenta-realtime.git
pip install -e magenta-realtime/[tpu] && pip install tf2jax==0.3.8 huggingface_hub
Step 3: Generate!
python -m magenta_rt.generate \
--prompt="blissful ambient synth" \
--output="./output.mp3"
Examples
Magenta RT offers powerful capabilities for music generation and style blending.
Generating audio with Magenta RT
from magenta_rt import audio, system
from IPython.display import display, Audio
num_seconds = 10
mrt = system.MagentaRT()
style = system.embed_style('funk')
chunks = []
state = None
for i in range(round(num_seconds / mrt.config.chunk_length)):
state, chunk = mrt.generate_chunk(state=state, style=style)
chunks.append(chunk)
generated = audio.concatenate(chunks)
display(Audio(generated.samples.swapaxes(0, 1), rate=mrt.sample_rate))
Blending text and audio styles with MusicCoCa
from magenta_rt import audio, musiccoca
import numpy as np
style_model = musiccoca.MusicCoCa()
my_audio = audio.Waveform.from_file('myjam.mp3')
weighted_styles = [
(2.0, my_audio),
(1.0, 'heavy metal'),
]
weights = np.array([w for w, _ in weighted_styles])
styles = style_model.embed([s for _, s in weighted_styles])
weights_norm = weights / weights.sum()
blended = (weights_norm[:, np.newaxis] * styles).mean(axis=0)
Tokenizing audio with SpectroStream
from magenta_rt import audio, spectrostream
codec = spectrostream.SpectroStream()
my_audio = audio.Waveform.from_file('jam.mp3')
my_tokens = codec.encode(my_audio)
my_audio_reconstruction = codec.decode(my_tokens)
Why Use Magenta RT?
Magenta RT provides a unique platform for real-time, on-device music generation. Its ability to respond to both text and audio prompts makes it incredibly flexible for creators. The integration with powerful models like MusicCoCa for style blending and SpectroStream for audio tokenization offers advanced capabilities for sophisticated audio manipulation. Being open-source, it allows developers and musicians to experiment, customize, and integrate live music generation into their projects.
Links
- GitHub Repository: https://github.com/magenta/magenta-realtime
- Official Blog Post: https://g.co/magenta/rt
- Research Paper: https://arxiv.org/abs/2508.04651
- Model Card: https://github.com/magenta/magenta-realtime/blob/main/MODEL.md
- Colab Demo (Main): https://colab.research.google.com/github/magenta/magenta-realtime/blob/main/notebooks/Magenta_RT_Demo.ipynb
- Colab Demo (Audio Injection): https://colab.research.google.com/github/magenta/magenta-realtime/blob/main/notebooks/Magenta_RT_Audio_Injection.ipynb
- Colab Demo (Finetuning): https://colab.research.google.com/github/magenta/magenta-realtime/blob/main/notebooks/Magenta_RT_Finetune.ipynb
- YouTube Video (Text Prompting): https://www.youtube.com/watch?v=Ae1Kz2zmh9M
- YouTube Video (Audio Prompting): https://www.youtube.com/watch?v=vHIf2UKXmp4
- YouTube Video (Colab Walkthrough): https://www.youtube.com/watch?v=SVTuEdeepVs