StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

Introduction

StreamDiffusion is a groundbreaking diffusion pipeline developed by cumulo-autumn, offering a pipeline-level solution for real-time interactive generation. This project aims to significantly enhance the performance of existing diffusion-based image generation techniques, enabling faster and more responsive AI art creation. It was introduced in the paper 'StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation' and is built on a foundation of innovative optimization strategies.

Installation

To get started with StreamDiffusion, follow these steps:

Step 0: Clone the Repository

git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion

Step 1: Make Environment

You can install StreamDiffusion via pip, conda, or Docker.

Using Conda:

conda create -n streamdiffusion python=3.10
conda activate streamdiffusion

Using Venv:

python -m venv .venv
# Windows
.\.venv\Scripts\activate
# Linux
source .venv/bin/activate

Step 2: Install PyTorch

Select the appropriate version for your system.

CUDA 11.8:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118

CUDA 12.1:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121

For more details, visit the PyTorch website.

Step 3: Install StreamDiffusion

For Users

Install StreamDiffusion:

# For Latest Version (recommended)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]

# Or for Stable Version
pip install streamdiffusion[tensorrt]

Install TensorRT extension:

python -m streamdiffusion.tools.install-tensorrt

(Only for Windows) You may need to install pywin32 additionally if you installed the Stable Version (pip install streamdiffusion[tensorrt]):

pip install --force-reinstall pywin32

For Developers

python setup.py develop easy_install streamdiffusion[tensorrt]
python -m streamdiffusion.tools.install-tensorrt

Docker Installation (TensorRT Ready)

git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
docker build -t stream-diffusion:latest -f Dockerfile .
docker run --gpus all -it -v $(pwd):/home/ubuntu/streamdiffusion stream-diffusion:latest

Examples

StreamDiffusion provides various examples to demonstrate its capabilities, including real-time text-to-image and image-to-image generation. You can find more detailed examples in the examples directory of the repository.

Image-to-Image Example

This example shows how to use StreamDiffusion for real-time image-to-image generation:

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

stream = StreamDiffusion(
    pipe,
    t_index_list=[32, 45],
    torch_dtype=torch.float16,
)

stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()

prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)

init_image = load_image("assets/img2img_example.png").resize((512, 512))

for _ in range(2):
    stream(init_image)

while True:
    x_output = stream(init_image)
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Text-to-Image Example

Here is a basic example for real-time text-to-image generation:

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

stream = StreamDiffusion(
    pipe,
    t_index_list=[0, 16, 32, 45],
    torch_dtype=torch.float16,
    cfg_type="none",
)

stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()

prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)

for _ in range(4):
    stream()

while True:
    x_output = stream.txt2img()
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Faster Generation with TensorRT

To achieve even faster generation, you can integrate TensorRT acceleration:

from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt

stream = accelerate_with_tensorrt(
    stream, "engines", max_batch_size=2,
)

This requires the TensorRT extension and time to build the engine, but it significantly boosts performance.

Real-Time Demos

StreamDiffusion includes interactive demos for both text-to-image and image-to-image generation.

Real-Time Txt2Img Demo:

Real-Time Img2Img Demo (Webcam/Screen Capture):

Why Use StreamDiffusion

StreamDiffusion stands out due to its innovative approach to optimizing diffusion models for real-time applications. Its core strength lies in a suite of features designed to maximize efficiency and speed.

Key Features

Stream Batch: Streamlined data processing through efficient batch operations.
Residual Classifier-Free Guidance (RCFG): An improved guidance mechanism that minimizes computational redundancy, offering competitive complexity compared to traditional CFG. It supports "Self-Negative" and "Onetime-Negative" configurations.
Stochastic Similarity Filter: Enhances GPU utilization efficiency by reducing processing when there is little change between frames, ideal for video inputs.
IO Queues: Efficiently manages input and output operations for smoother execution.
Pre-Computation for KV-Caches: Optimizes caching strategies for accelerated processing.
Model Acceleration Tools: Utilizes various tools for model optimization and performance boost, including TensorRT integration.

Performance Benchmarks

When running on an RTX 4090 GPU, Core i9-13900K CPU, and Ubuntu 22.04.3 LTS, StreamDiffusion achieves impressive frame rates:

model	Denoising Step	fps on Txt2Img	fps on Img2Img
SD-turbo	1	106.16	93.897
LCM-LoRA + KohakuV2	4	38.023	37.133

These benchmarks demonstrate StreamDiffusion's capability to deliver high-speed generation, making it a powerful tool for interactive AI applications.