StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
StreamDiffusion is a groundbreaking diffusion pipeline developed by cumulo-autumn, offering a pipeline-level solution for real-time interactive generation. This project aims to significantly enhance the performance of existing diffusion-based image generation techniques, enabling faster and more responsive AI art creation. It was introduced in the paper 'StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation' and is built on a foundation of innovative optimization strategies.
Installation
To get started with StreamDiffusion, follow these steps:
Step 0: Clone the Repository
git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
Step 1: Make Environment
You can install StreamDiffusion via pip, conda, or Docker.
Using Conda:
conda create -n streamdiffusion python=3.10
conda activate streamdiffusion
Using Venv:
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# Linux
source .venv/bin/activate
Step 2: Install PyTorch
Select the appropriate version for your system.
CUDA 11.8:
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118
CUDA 12.1:
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121
For more details, visit the PyTorch website.
Step 3: Install StreamDiffusion
For Users
Install StreamDiffusion:
# For Latest Version (recommended)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]
# Or for Stable Version
pip install streamdiffusion[tensorrt]
Install TensorRT extension:
python -m streamdiffusion.tools.install-tensorrt
(Only for Windows) You may need to install pywin32 additionally if you installed the Stable Version (pip install streamdiffusion[tensorrt]):
pip install --force-reinstall pywin32
For Developers
python setup.py develop easy_install streamdiffusion[tensorrt]
python -m streamdiffusion.tools.install-tensorrt
Docker Installation (TensorRT Ready)
git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
docker build -t stream-diffusion:latest -f Dockerfile .
docker run --gpus all -it -v $(pwd):/home/ubuntu/streamdiffusion stream-diffusion:latest
Examples
StreamDiffusion provides various examples to demonstrate its capabilities, including real-time text-to-image and image-to-image generation. You can find more detailed examples in the examples directory of the repository.
Image-to-Image Example
This example shows how to use StreamDiffusion for real-time image-to-image generation:
import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image
from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
device=torch.device("cuda"),
dtype=torch.float16,
)
stream = StreamDiffusion(
pipe,
t_index_list=[32, 45],
torch_dtype=torch.float16,
)
stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()
prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)
init_image = load_image("assets/img2img_example.png").resize((512, 512))
for _ in range(2):
stream(init_image)
while True:
x_output = stream(init_image)
postprocess_image(x_output, output_type="pil")[0].show()
input_response = input("Press Enter to continue or type 'stop' to exit: ")
if input_response == "stop":
break
Text-to-Image Example
Here is a basic example for real-time text-to-image generation:
import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
device=torch.device("cuda"),
dtype=torch.float16,
)
stream = StreamDiffusion(
pipe,
t_index_list=[0, 16, 32, 45],
torch_dtype=torch.float16,
cfg_type="none",
)
stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()
prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)
for _ in range(4):
stream()
while True:
x_output = stream.txt2img()
postprocess_image(x_output, output_type="pil")[0].show()
input_response = input("Press Enter to continue or type 'stop' to exit: ")
if input_response == "stop":
break
Faster Generation with TensorRT
To achieve even faster generation, you can integrate TensorRT acceleration:
from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt
stream = accelerate_with_tensorrt(
stream, "engines", max_batch_size=2,
)
This requires the TensorRT extension and time to build the engine, but it significantly boosts performance.
Real-Time Demos
StreamDiffusion includes interactive demos for both text-to-image and image-to-image generation.
Real-Time Txt2Img Demo:
Real-Time Img2Img Demo (Webcam/Screen Capture):
Why Use StreamDiffusion
StreamDiffusion stands out due to its innovative approach to optimizing diffusion models for real-time applications. Its core strength lies in a suite of features designed to maximize efficiency and speed.
Key Features
- Stream Batch: Streamlined data processing through efficient batch operations.
- Residual Classifier-Free Guidance (RCFG): An improved guidance mechanism that minimizes computational redundancy, offering competitive complexity compared to traditional CFG. It supports "Self-Negative" and "Onetime-Negative" configurations.
- Stochastic Similarity Filter: Enhances GPU utilization efficiency by reducing processing when there is little change between frames, ideal for video inputs.
- IO Queues: Efficiently manages input and output operations for smoother execution.
- Pre-Computation for KV-Caches: Optimizes caching strategies for accelerated processing.
- Model Acceleration Tools: Utilizes various tools for model optimization and performance boost, including TensorRT integration.
Performance Benchmarks
When running on an RTX 4090 GPU, Core i9-13900K CPU, and Ubuntu 22.04.3 LTS, StreamDiffusion achieves impressive frame rates:
| model | Denoising Step | fps on Txt2Img | fps on Img2Img |
|---|---|---|---|
| SD-turbo | 1 | 106.16 | 93.897 |
| LCM-LoRA + KohakuV2 |
4 | 38.023 | 37.133 |
These benchmarks demonstrate StreamDiffusion's capability to deliver high-speed generation, making it a powerful tool for interactive AI applications.
Links
For more information and to explore the project further, please visit the official links:
- GitHub Repository: https://github.com/cumulo-autumn/StreamDiffusion
- arXiv Paper: https://arxiv.org/abs/2312.12491
- Hugging Face Papers: https://huggingface.co/papers/2312.12491
Related repositories
Similar repositories that may be relevant next.

LazyLLM: Low-Code Development for Multi-Agent LLM Applications
July 2, 2026
LazyLLM offers a low-code development tool designed for building multi-agent LLM applications with ease. It simplifies the creation of complex AI applications, providing a streamlined workflow for rapid prototyping, data feedback, and iterative optimization. Developers can leverage its extensive features for deployment, cross-platform compatibility, and efficient model fine-tuning.

ChatArena: Multi-Agent Language Game Environments for LLMs
July 1, 2026
ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.
Agentarium: A Python Framework for AI Agent Simulations
July 1, 2026
Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.
Lighteval: Your All-in-One Toolkit for LLM Evaluation
July 1, 2026
Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.
Source repository
Open the original repository on GitHub.