StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

Summary

StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.

Repository Info

Updated on December 13, 2025
View on GitHub

Introduction

StreamDiffusion is a groundbreaking diffusion pipeline developed by cumulo-autumn, offering a pipeline-level solution for real-time interactive generation. This project aims to significantly enhance the performance of existing diffusion-based image generation techniques, enabling faster and more responsive AI art creation. It was introduced in the paper 'StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation' and is built on a foundation of innovative optimization strategies.

Installation

To get started with StreamDiffusion, follow these steps:

Step 0: Clone the Repository

git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion

Step 1: Make Environment

You can install StreamDiffusion via pip, conda, or Docker.

Using Conda:

conda create -n streamdiffusion python=3.10
conda activate streamdiffusion

Using Venv:

python -m venv .venv
# Windows
.\.venv\Scripts\activate
# Linux
source .venv/bin/activate

Step 2: Install PyTorch

Select the appropriate version for your system.

CUDA 11.8:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118

CUDA 12.1:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121

For more details, visit the PyTorch website.

Step 3: Install StreamDiffusion

For Users

Install StreamDiffusion:

# For Latest Version (recommended)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]

# Or for Stable Version
pip install streamdiffusion[tensorrt]

Install TensorRT extension:

python -m streamdiffusion.tools.install-tensorrt

(Only for Windows) You may need to install pywin32 additionally if you installed the Stable Version (pip install streamdiffusion[tensorrt]):

pip install --force-reinstall pywin32

For Developers

python setup.py develop easy_install streamdiffusion[tensorrt]
python -m streamdiffusion.tools.install-tensorrt

Docker Installation (TensorRT Ready)

git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
docker build -t stream-diffusion:latest -f Dockerfile .
docker run --gpus all -it -v $(pwd):/home/ubuntu/streamdiffusion stream-diffusion:latest

Examples

StreamDiffusion provides various examples to demonstrate its capabilities, including real-time text-to-image and image-to-image generation. You can find more detailed examples in the examples directory of the repository.

Image-to-Image Example

This example shows how to use StreamDiffusion for real-time image-to-image generation:

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

stream = StreamDiffusion(
    pipe,
    t_index_list=[32, 45],
    torch_dtype=torch.float16,
)

stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()

prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)

init_image = load_image("assets/img2img_example.png").resize((512, 512))

for _ in range(2):
    stream(init_image)

while True:
    x_output = stream(init_image)
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Text-to-Image Example

Here is a basic example for real-time text-to-image generation:

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

stream = StreamDiffusion(
    pipe,
    t_index_list=[0, 16, 32, 45],
    torch_dtype=torch.float16,
    cfg_type="none",
)

stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()

prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)

for _ in range(4):
    stream()

while True:
    x_output = stream.txt2img()
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Faster Generation with TensorRT

To achieve even faster generation, you can integrate TensorRT acceleration:

from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt

stream = accelerate_with_tensorrt(
    stream, "engines", max_batch_size=2,
)

This requires the TensorRT extension and time to build the engine, but it significantly boosts performance.

Real-Time Demos

StreamDiffusion includes interactive demos for both text-to-image and image-to-image generation.

Real-Time Txt2Img Demo:
Real-Time Txt2Img Demo

Real-Time Img2Img Demo (Webcam/Screen Capture):
Real-Time Img2Img Demo

Why Use StreamDiffusion

StreamDiffusion stands out due to its innovative approach to optimizing diffusion models for real-time applications. Its core strength lies in a suite of features designed to maximize efficiency and speed.

Key Features

  • Stream Batch: Streamlined data processing through efficient batch operations.
  • Residual Classifier-Free Guidance (RCFG): An improved guidance mechanism that minimizes computational redundancy, offering competitive complexity compared to traditional CFG. It supports "Self-Negative" and "Onetime-Negative" configurations.
  • Stochastic Similarity Filter: Enhances GPU utilization efficiency by reducing processing when there is little change between frames, ideal for video inputs.
  • IO Queues: Efficiently manages input and output operations for smoother execution.
  • Pre-Computation for KV-Caches: Optimizes caching strategies for accelerated processing.
  • Model Acceleration Tools: Utilizes various tools for model optimization and performance boost, including TensorRT integration.

Performance Benchmarks

When running on an RTX 4090 GPU, Core i9-13900K CPU, and Ubuntu 22.04.3 LTS, StreamDiffusion achieves impressive frame rates:

model Denoising Step fps on Txt2Img fps on Img2Img
SD-turbo 1 106.16 93.897
LCM-LoRA
+
KohakuV2
4 38.023 37.133

These benchmarks demonstrate StreamDiffusion's capability to deliver high-speed generation, making it a powerful tool for interactive AI applications.

Links

For more information and to explore the project further, please visit the official links: