Magenta RT: Live Music Generation on Your Local Device

Introduction

Magenta RealTime (Magenta RT) is an open-source Python library designed for live music audio generation directly on your local device. It serves as the on-device companion to Google's MusicFX DJ Mode and the Lyria RealTime API, enabling users to create music through both text and audio prompts. Magenta RT generates audio in short chunks (2 seconds) given a finite amount of past context (10 seconds), utilizing crossfading to mitigate boundary artifacts between chunks.

Installation

Getting started with Magenta RT is straightforward, offering options for cloud-based demos or local setup.

The fastest way to experience Magenta RT is through its official Colab Demo, which runs in real-time on freely available TPUs. Additionally, there are Colab demos supporting live audio input and customization via finetuning.

For local execution, you can choose between using Docker or performing a native installation.

Running Locally via Docker

This method requires a powerful GPU with 40GB memory, Linux, and Docker.

mkdir -p ~/.cache/magenta_rt
docker run -it \
  --gpus device=0 \
  -v ~/.cache/magenta_rt:/magenta-realtime/cache \
  -p 8000:8000 \
  us-docker.pkg.dev/brain-magenta/magenta-rt/magenta-rt:gpu

After running the command, open the web demo at https://localhost:8000.

Local Installation

If you prefer to run Magenta RT natively rather than using Docker, follow these instructions.

Step 1: Install Python 3.12

sudo apt update
sudo apt install software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.12 python3.12-venv python3.12-dev -y

Step 2: Install Magenta RT for GPU

# Clone Magenta RT
git clone https://github.com/magenta/magenta-realtime.git
cd magenta-realtime
# Create a virtual environment
python3.12 -m venv .venv
source .venv/bin/activate
# Patch and install t5x
git clone https://github.com/google-research/t5x.git && \
  pushd t5x && \
  git checkout 7781d16 && \
  patch setup.py < ../patch/t5x_setup.py.patch && \
  patch t5x/partitioning.py < ../patch/t5x_partitioning.py.patch && \
  pip install .[gpu] && \
  popd
# Install Magenta RT
pip install -e .[gpu] && pip install tf2jax==0.3.8
# Patch seqIO to remove tensorflow-text dependency
patch .venv/lib/python3.12/site-packages/seqio/vocabularies.py < patch/seqio_vocabularies.py.patch

Step 2 (alternative): Install Magenta RT for TPU

# Create a virtual environment
python3.12 -m venv .venv
source .venv/bin/activate
# Install Magenta RT
git clone https://github.com/magenta/magenta-realtime.git
pip install -e magenta-realtime/[tpu] && pip install tf2jax==0.3.8 huggingface_hub

Step 3: Generate!

python -m magenta_rt.generate \
  --prompt="blissful ambient synth" \
  --output="./output.mp3"

Examples

Magenta RT offers powerful capabilities for music generation and style blending.

Generating audio with Magenta RT

from magenta_rt import audio, system
from IPython.display import display, Audio

num_seconds = 10
mrt = system.MagentaRT()
style = system.embed_style('funk')

chunks = []
state = None
for i in range(round(num_seconds / mrt.config.chunk_length)):
  state, chunk = mrt.generate_chunk(state=state, style=style)
  chunks.append(chunk)
generated = audio.concatenate(chunks)
display(Audio(generated.samples.swapaxes(0, 1), rate=mrt.sample_rate))

Blending text and audio styles with MusicCoCa

from magenta_rt import audio, musiccoca
import numpy as np

style_model = musiccoca.MusicCoCa()
my_audio = audio.Waveform.from_file('myjam.mp3')
weighted_styles = [
  (2.0, my_audio),
  (1.0, 'heavy metal'),
]
weights = np.array([w for w, _ in weighted_styles])
styles = style_model.embed([s for _, s in weighted_styles])
weights_norm = weights / weights.sum()
blended = (weights_norm[:, np.newaxis] * styles).mean(axis=0)

Tokenizing audio with SpectroStream

from magenta_rt import audio, spectrostream

codec = spectrostream.SpectroStream()
my_audio = audio.Waveform.from_file('jam.mp3')
my_tokens = codec.encode(my_audio)
my_audio_reconstruction = codec.decode(my_tokens)

Why Use Magenta RT?

Magenta RT provides a unique platform for real-time, on-device music generation. Its ability to respond to both text and audio prompts makes it incredibly flexible for creators. The integration with powerful models like MusicCoCa for style blending and SpectroStream for audio tokenization offers advanced capabilities for sophisticated audio manipulation. Being open-source, it allows developers and musicians to experiment, customize, and integrate live music generation into their projects.