CineScale: Unlocking 4K High-Resolution Cinematic Video Generation

CineScale: Unlocking 4K High-Resolution Cinematic Video Generation

Summary

CineScale is an innovative GitHub repository by Eyeline-Labs, extending FreeScale to enable high-resolution cinematic video generation. It provides models and tools to achieve up to 4K video output, leveraging diffusion models for advanced visual content creation. This project offers a robust framework for researchers and developers to generate stunning, high-definition videos.

Repository Info

Updated on December 18, 2025
View on GitHub

Introduction

CineScale, developed by Eyeline-Labs, is an open-source project that extends the capabilities of FreeScale for high-resolution cinematic visual generation, specifically unlocking 4K video output. This repository provides the code and models necessary for generating stunning, high-definition videos using advanced diffusion models. It is the result of collaborative research by Haonan Qiu, Ning Yu, Ziqi Huang, Paul Debevec, and Ziwei Liu from Nanyang Technological University and Netflix Eyeline Studios.

For more details, you can explore the arXiv paper and the Project Page.

Installation

To get started with CineScale, follow these steps to set up your environment using Anaconda:

git clone https://github.com/Eyeline-Labs/CineScale.git
cd CineScale

conda create -n cinescale python=3.10
conda activate cinescale
pip install -e .
pip install xfuser>=0.4.3
pip install flash-attn==2.7.4.post1 --no-build-isolation

Examples

CineScale offers various models and inference commands for different resolutions and tasks. First, ensure you download the necessary checkpoints from Hugging Face and place them in the models folder.

2K-Resolution Text-to-Video (Base Model Wan2.1-1.3B)

Single GPU:

CUDA_VISIBLE_DEVICES=0 python cinescale_t2v1.3b_single.py

Multiple GPUs:

torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b.py

3K-Resolution Text-to-Video (Base Model Wan2.1-1.3B)

torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b_pro.py

4K-Resolution Text-to-Video (Base Model Wan2.1-14B)

torchrun --standalone --nproc_per_node=8 cinescale_t2v14b_pro.py

4K-Resolution Image-to-Video (Base Model Wan2.1-14B)

# May set attention_coef to 1.5 for better results (line 123, diffsynth/distributed/xdit_context_parallel.py)

torchrun --standalone --nproc_per_node=8 cinescale_i2v14b.py

Why Use CineScale?

CineScale stands out for its ability to generate high-resolution cinematic videos, pushing the boundaries of what's possible with generative AI. By extending existing models like FreeScale and Wan2.1, it provides a robust framework for researchers and developers to create stunning visual content up to 4K resolution. Its support for both text-to-video and image-to-video generation, coupled with optimized models for various GPU configurations, makes it a powerful tool for advanced video synthesis.

Links