StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

This repository profile is provided by osrepos.com, an open source repository discovery platform.

StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

Summary

StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.

Repository Information

Analyzed by OSRepos on December 13, 2025

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

StreamDiffusion is a groundbreaking diffusion pipeline developed by cumulo-autumn, offering a pipeline-level solution for real-time interactive generation. This project aims to significantly enhance the performance of existing diffusion-based image generation techniques, enabling faster and more responsive AI art creation. It was introduced in the paper 'StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation' and is built on a foundation of innovative optimization strategies.

Installation

To get started with StreamDiffusion, follow these steps:

Step 0: Clone the Repository

git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion

Step 1: Make Environment

You can install StreamDiffusion via pip, conda, or Docker.

Using Conda:

conda create -n streamdiffusion python=3.10
conda activate streamdiffusion

Using Venv:

python -m venv .venv
# Windows
.\.venv\Scripts\activate
# Linux
source .venv/bin/activate

Step 2: Install PyTorch

Select the appropriate version for your system.

CUDA 11.8:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118

CUDA 12.1:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121

For more details, visit the PyTorch website.

Step 3: Install StreamDiffusion

For Users

Install StreamDiffusion:

# For Latest Version (recommended)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]

# Or for Stable Version
pip install streamdiffusion[tensorrt]

Install TensorRT extension:

python -m streamdiffusion.tools.install-tensorrt

(Only for Windows) You may need to install pywin32 additionally if you installed the Stable Version (pip install streamdiffusion[tensorrt]):

pip install --force-reinstall pywin32

For Developers

python setup.py develop easy_install streamdiffusion[tensorrt]
python -m streamdiffusion.tools.install-tensorrt

Docker Installation (TensorRT Ready)

git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
docker build -t stream-diffusion:latest -f Dockerfile .
docker run --gpus all -it -v $(pwd):/home/ubuntu/streamdiffusion stream-diffusion:latest

Examples

StreamDiffusion provides various examples to demonstrate its capabilities, including real-time text-to-image and image-to-image generation. You can find more detailed examples in the examples directory of the repository.

Image-to-Image Example

This example shows how to use StreamDiffusion for real-time image-to-image generation:

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

stream = StreamDiffusion(
    pipe,
    t_index_list=[32, 45],
    torch_dtype=torch.float16,
)

stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()

prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)

init_image = load_image("assets/img2img_example.png").resize((512, 512))

for _ in range(2):
    stream(init_image)

while True:
    x_output = stream(init_image)
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Text-to-Image Example

Here is a basic example for real-time text-to-image generation:

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

stream = StreamDiffusion(
    pipe,
    t_index_list=[0, 16, 32, 45],
    torch_dtype=torch.float16,
    cfg_type="none",
)

stream.load_lcm_lora()
stream.fuse_lora()
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
pipe.enable_xformers_memory_efficient_attention()

prompt = "1girl with dog hair, thick frame glasses"
stream.prepare(prompt)

for _ in range(4):
    stream()

while True:
    x_output = stream.txt2img()
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Faster Generation with TensorRT

To achieve even faster generation, you can integrate TensorRT acceleration:

from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt

stream = accelerate_with_tensorrt(
    stream, "engines", max_batch_size=2,
)

This requires the TensorRT extension and time to build the engine, but it significantly boosts performance.

Real-Time Demos

StreamDiffusion includes interactive demos for both text-to-image and image-to-image generation.

Real-Time Txt2Img Demo:
Real-Time Txt2Img Demo

Real-Time Img2Img Demo (Webcam/Screen Capture):
Real-Time Img2Img Demo

Why Use StreamDiffusion

StreamDiffusion stands out due to its innovative approach to optimizing diffusion models for real-time applications. Its core strength lies in a suite of features designed to maximize efficiency and speed.

Key Features

  • Stream Batch: Streamlined data processing through efficient batch operations.
  • Residual Classifier-Free Guidance (RCFG): An improved guidance mechanism that minimizes computational redundancy, offering competitive complexity compared to traditional CFG. It supports "Self-Negative" and "Onetime-Negative" configurations.
  • Stochastic Similarity Filter: Enhances GPU utilization efficiency by reducing processing when there is little change between frames, ideal for video inputs.
  • IO Queues: Efficiently manages input and output operations for smoother execution.
  • Pre-Computation for KV-Caches: Optimizes caching strategies for accelerated processing.
  • Model Acceleration Tools: Utilizes various tools for model optimization and performance boost, including TensorRT integration.

Performance Benchmarks

When running on an RTX 4090 GPU, Core i9-13900K CPU, and Ubuntu 22.04.3 LTS, StreamDiffusion achieves impressive frame rates:

model Denoising Step fps on Txt2Img fps on Img2Img
SD-turbo 1 106.16 93.897
LCM-LoRA
+
KohakuV2
4 38.023 37.133

These benchmarks demonstrate StreamDiffusion's capability to deliver high-speed generation, making it a powerful tool for interactive AI applications.

Links

For more information and to explore the project further, please visit the official links:

Related repositories

Similar repositories that may be relevant next.

LazyLLM: Low-Code Development for Multi-Agent LLM Applications

LazyLLM: Low-Code Development for Multi-Agent LLM Applications

July 2, 2026

LazyLLM offers a low-code development tool designed for building multi-agent LLM applications with ease. It simplifies the creation of complex AI applications, providing a streamlined workflow for rapid prototyping, data feedback, and iterative optimization. Developers can leverage its extensive features for deployment, cross-platform compatibility, and efficient model fine-tuning.

PythonAI DevelopmentMulti-Agent
ChatArena: Multi-Agent Language Game Environments for LLMs

ChatArena: Multi-Agent Language Game Environments for LLMs

July 1, 2026

ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.

AILarge Language ModelsMulti-Agent Systems
Agentarium: A Python Framework for AI Agent Simulations

Agentarium: A Python Framework for AI Agent Simulations

July 1, 2026

Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.

PythonAIAgents
Lighteval: Your All-in-One Toolkit for LLM Evaluation

Lighteval: Your All-in-One Toolkit for LLM Evaluation

July 1, 2026

Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.

evaluationevaluation-frameworkevaluation-metrics

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️