StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

Summary
StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
StreamDiffusion is a groundbreaking diffusion pipeline developed by cumulo-autumn, offering a pipeline-level solution for real-time interactive generation. This project aims to significantly enhance the performance of existing diffusion-based image generation techniques, enabling faster and more responsive AI art creation. It was introduced in the paper 'StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation' and is built on a foundation of innovative optimization strategies.
Installation
To get started with StreamDiffusion, follow these steps:
Step 0: Clone the Repository
git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
Step 1: Make Environment
You can install StreamDiffusion via pip, conda, or Docker.
Using Conda:
conda create -n streamdiffusion python=3.10
conda activate streamdiffusion
Using Venv:
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# Linux
source .venv/bin/activate
Step 2: Install PyTorch
Select the appropriate version for your system.
CUDA 11.8:
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118
CUDA 12.1:
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121
For more details, visit the PyTorch website.
Step 3: Install StreamDiffusion
For Users
Install StreamDiffusion:
# For Latest Version (recommended)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]
# Or for Stable Version
pip install streamdiffusion[tensorrt]
Install TensorRT extension:
python -m streamdiffusion.tools.install-tensorrt
(Only for Windows) You may need to install pywin32 additionally if you installed the Stable Version (pip install streamdiffusion[tensorrt]):
pip install --force-reinstall pywin32
For Developers
python setup.py develop easy_install streamdiffusion[tensorrt]
python -m streamdiffusion.tools.install-tensorrt
Docker Installation (TensorRT Ready)
git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
docker build -t stream-diffusion:latest -f Dockerfile .
docker run --gpus all -it -v $(pwd):/home/ubuntu/streamdiffusion stream-diffusion:latest
Examples
StreamDiffusion provides various examples to demonstrate its capabilities, including real-time text-to-image and image-to-image generation. You can find more detailed examples in the examples directory of the repository.
Image-to-Image Example
This example shows how to use StreamDiffusion for real-time image-to-image generation:
import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image
from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
device=torch.device("cuda"),
dtype=torch.float16,
)
stream = StreamDiffusion(
pipe,
t_index_list=[32, 45],
torch_dtype=torch.float16,
)
stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()
prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)
init_image = load_image("assets/img2img_example.png").resize((512, 512))
for _ in range(2):
stream(init_image)
while True:
x_output = stream(init_image)
postprocess_image(x_output, output_type="pil")[0].show()
input_response = input("Press Enter to continue or type 'stop' to exit: ")
if input_response == "stop":
break
Text-to-Image Example
Here is a basic example for real-time text-to-image generation:
import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
device=torch.device("cuda"),
dtype=torch.float16,
)
stream = StreamDiffusion(
pipe,
t_index_list=[0, 16, 32, 45],
torch_dtype=torch.float16,
cfg_type="none",
)
stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()
prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)
for _ in range(4):
stream()
while True:
x_output = stream.txt2img()
postprocess_image(x_output, output_type="pil")[0].show()
input_response = input("Press Enter to continue or type 'stop' to exit: ")
if input_response == "stop":
break
Faster Generation with TensorRT
To achieve even faster generation, you can integrate TensorRT acceleration:
from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt
stream = accelerate_with_tensorrt(
stream, "engines", max_batch_size=2,
)
This requires the TensorRT extension and time to build the engine, but it significantly boosts performance.
Real-Time Demos
StreamDiffusion includes interactive demos for both text-to-image and image-to-image generation.
Real-Time Txt2Img Demo:
Real-Time Img2Img Demo (Webcam/Screen Capture):
Why Use StreamDiffusion
StreamDiffusion stands out due to its innovative approach to optimizing diffusion models for real-time applications. Its core strength lies in a suite of features designed to maximize efficiency and speed.
Key Features
- Stream Batch: Streamlined data processing through efficient batch operations.
- Residual Classifier-Free Guidance (RCFG): An improved guidance mechanism that minimizes computational redundancy, offering competitive complexity compared to traditional CFG. It supports "Self-Negative" and "Onetime-Negative" configurations.
- Stochastic Similarity Filter: Enhances GPU utilization efficiency by reducing processing when there is little change between frames, ideal for video inputs.
- IO Queues: Efficiently manages input and output operations for smoother execution.
- Pre-Computation for KV-Caches: Optimizes caching strategies for accelerated processing.
- Model Acceleration Tools: Utilizes various tools for model optimization and performance boost, including TensorRT integration.
Performance Benchmarks
When running on an RTX 4090 GPU, Core i9-13900K CPU, and Ubuntu 22.04.3 LTS, StreamDiffusion achieves impressive frame rates:
| model | Denoising Step | fps on Txt2Img | fps on Img2Img |
|---|---|---|---|
| SD-turbo | 1 | 106.16 | 93.897 |
| LCM-LoRA + KohakuV2 |
4 | 38.023 | 37.133 |
These benchmarks demonstrate StreamDiffusion's capability to deliver high-speed generation, making it a powerful tool for interactive AI applications.
Links
For more information and to explore the project further, please visit the official links:
- GitHub Repository: https://github.com/cumulo-autumn/StreamDiffusion
- arXiv Paper: https://arxiv.org/abs/2312.12491
- Hugging Face Papers: https://huggingface.co/papers/2312.12491