Repository History

Explore all analyzed open source repositories

Topic: Diffusion Models
Transformer Lab App: An Open Source Platform for Frontier AI/ML Workflows

Transformer Lab App: An Open Source Platform for Frontier AI/ML Workflows

Transformer Lab App is an open-source machine learning research platform designed for frontier AI/ML workflows. It provides a comprehensive toolkit for large language models, allowing users to train, tune, and chat on their own machines, whether locally, on-prem, or in the cloud. Backed by Mozilla, this cross-platform application simplifies experimentation with a wide range of models.

Dec 31, 2025
View Details
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation

HunyuanVideo-Avatar is a cutting-edge project by Tencent-Hunyuan for high-fidelity, audio-driven human animation. Utilizing a multimodal diffusion transformer, it generates dynamic, emotion-controllable, and multi-character dialogue videos. This innovative system addresses critical challenges in character consistency, emotion alignment, and multi-character animation, making it suitable for diverse applications like e-commerce and social media.

Dec 30, 2025
View Details
StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.

Dec 13, 2025
View Details
Step-Video-T2V: State-of-the-Art Text-to-Video Generation Model

Step-Video-T2V: State-of-the-Art Text-to-Video Generation Model

Step-Video-T2V is a state-of-the-art text-to-video pre-trained model capable of generating videos up to 204 frames with 30 billion parameters. It achieves high efficiency through a deep compression Video-VAE and enhances visual quality using Direct Preference Optimization (DPO). The model's performance is validated on its novel benchmark, Step-Video-T2V-Eval, demonstrating superior text-to-video quality.

Oct 29, 2025
View Details
Leffa: Controllable Person Image Generation with Flow Fields in Attention

Leffa: Controllable Person Image Generation with Flow Fields in Attention

Leffa is a unified framework for controllable person image generation, enabling precise manipulation of appearance through virtual try-on and pose via pose transfer. This project addresses the common issue of fine-grained textural detail distortion by learning flow fields in attention, guiding target queries to correct reference keys. It achieves state-of-the-art performance, maintaining high image quality while significantly reducing detail distortion.

Oct 12, 2025
View Details
Page 1