CineScale: Unlocking 4K High-Resolution Cinematic Video Generation

Introduction

CineScale, developed by Eyeline-Labs, is an open-source project that extends the capabilities of FreeScale for high-resolution cinematic visual generation, specifically unlocking 4K video output. This repository provides the code and models necessary for generating stunning, high-definition videos using advanced diffusion models. It is the result of collaborative research by Haonan Qiu, Ning Yu, Ziqi Huang, Paul Debevec, and Ziwei Liu from Nanyang Technological University and Netflix Eyeline Studios.

For more details, you can explore the arXiv paper and the Project Page.

Installation

To get started with CineScale, follow these steps to set up your environment using Anaconda:

git clone https://github.com/Eyeline-Labs/CineScale.git
cd CineScale

conda create -n cinescale python=3.10
conda activate cinescale
pip install -e .
pip install xfuser>=0.4.3
pip install flash-attn==2.7.4.post1 --no-build-isolation

Examples

CineScale offers various models and inference commands for different resolutions and tasks. First, ensure you download the necessary checkpoints from Hugging Face and place them in the models folder.

2K-Resolution Text-to-Video (Base Model Wan2.1-1.3B)

Single GPU:

CUDA_VISIBLE_DEVICES=0 python cinescale_t2v1.3b_single.py

Multiple GPUs:

torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b.py

3K-Resolution Text-to-Video (Base Model Wan2.1-1.3B)

torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b_pro.py

4K-Resolution Text-to-Video (Base Model Wan2.1-14B)

torchrun --standalone --nproc_per_node=8 cinescale_t2v14b_pro.py

4K-Resolution Image-to-Video (Base Model Wan2.1-14B)

# May set attention_coef to 1.5 for better results (line 123, diffsynth/distributed/xdit_context_parallel.py)

torchrun --standalone --nproc_per_node=8 cinescale_i2v14b.py

Why Use CineScale?

CineScale stands out for its ability to generate high-resolution cinematic videos, pushing the boundaries of what's possible with generative AI. By extending existing models like FreeScale and Wan2.1, it provides a robust framework for researchers and developers to create stunning visual content up to 4K resolution. Its support for both text-to-video and image-to-video generation, coupled with optimized models for various GPU configurations, makes it a powerful tool for advanced video synthesis.

CineScale: Unlocking 4K High-Resolution Cinematic Video Generation

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use CineScale?

Links