HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation

This repository profile is provided by osrepos.com, an open source repository discovery platform.

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation

Summary

HunyuanVideo-Avatar is a cutting-edge project by Tencent-Hunyuan for high-fidelity, audio-driven human animation. Utilizing a multimodal diffusion transformer, it generates dynamic, emotion-controllable, and multi-character dialogue videos. This innovative system addresses critical challenges in character consistency, emotion alignment, and multi-character animation, making it suitable for diverse applications like e-commerce and social media.

Repository Information

Analyzed by OSRepos on December 30, 2025

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

HunyuanVideo-Avatar is an advanced open-source project by Tencent-Hunyuan that enables high-fidelity, audio-driven human animation for multiple characters. Built upon a multimodal diffusion transformer (MM-DiT), this model excels at generating dynamic, emotion-controllable, and multi-character dialogue videos. It tackles key challenges in the field by ensuring strong character consistency, precise emotion alignment between characters and audio, and facilitating multi-character animation.

The project introduces three core innovations:

  • Character Image Injection Module: Replaces conventional addition-based conditioning to eliminate condition mismatch, ensuring dynamic motion and strong character consistency.
  • Audio Emotion Module (AEM): Extracts and transfers emotional cues from a reference image to the generated video, enabling fine-grained emotion style control.
  • Face-Aware Audio Adapter (FAA): Isolates audio-driven characters with a latent-level face mask, allowing independent audio injection via cross-attention for multi-character scenarios.

These innovations allow HunyuanVideo-Avatar to produce realistic avatars in dynamic, immersive scenarios, surpassing state-of-the-art methods.

Installation

To get started with HunyuanVideo-Avatar, follow these installation steps, primarily for Linux environments. CUDA versions 12.4 or 11.8 are recommended.

First, clone the repository:

git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar.git
cd HunyuanVideo-Avatar

Then, set up the Conda environment and install dependencies:

# 1. Create conda environment
conda create -n HunyuanVideo-Avatar python==3.10.9

# 2. Activate the environment
conda activate HunyuanVideo-Avatar

# 3. Install PyTorch and other dependencies using conda
# For CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# 4. Install pip dependencies
python -m pip install -r requirements.txt
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3

For specific GPU types encountering float point exceptions, refer to the repository's README for additional solutions, including CUDA 12.4/CUBLAS/CUDNN updates or explicit CUDA 11.8 compiled PyTorch installations.

Alternatively, you can use the provided Docker images:

# For CUDA 12.4
docker pull hunyuanvideo/hunyuanvideo:cuda_12
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12
pip install gradio==3.39.0 diffusers==0.33.0 transformers==4.41.2

# For CUDA 11.8
docker pull hunyuanvideo/hunyuanvideo:cuda_11
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11
pip install gradio==3.39.0 diffusers==0.33.0 transformers==4.41.2

Pretrained models can be downloaded as detailed in the weights/README.md file within the repository.

Examples

HunyuanVideo-Avatar supports both parallel inference on multiple GPUs and single-GPU inference, including options for very low VRAM environments.

Parallel Inference on Multiple GPUs (e.g., 8 GPUs):

cd HunyuanVideo-Avatar

JOBS_DIR=$(dirname $(dirname "$0"))
export PYTHONPATH=./
export MODEL_BASE="./weights"
checkpoint_path=${MODEL_BASE}/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt

torchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp/sample_batch.py \
    --input 'assets/test.csv' \
    --ckpt ${checkpoint_path} \
    --sample-n-frames 129 \
    --seed 128 \
    --image-size 704 \
    --cfg-scale 7.5 \
    --infer-steps 50 \
    --use-deepcache 1 \
    --flow-shift-eval-video 5.0 \
    --save-path ${OUTPUT_BASEPATH} 

Single-GPU Inference:

cd HunyuanVideo-Avatar

JOBS_DIR=$(dirname $(dirname "$0"))
export PYTHONPATH=./

export MODEL_BASE=./weights
OUTPUT_BASEPATH=./results-single
checkpoint_path=${MODEL_BASE}/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt

export DISABLE_SP=1 
CUDA_VISIBLE_DEVICES=0 python3 hymm_sp/sample_gpu_poor.py \
    --input 'assets/test.csv' \
    --ckpt ${checkpoint_path} \
    --sample-n-frames 129 \
    --seed 128 \
    --image-size 704 \
    --cfg-scale 7.5 \
    --infer-steps 50 \
    --use-deepcache 1 \
    --flow-shift-eval-video 5.0 \
    --save-path ${OUTPUT_BASEPATH} \
    --use-fp8 \
    --infer-min

Run with very low VRAM (e.g., 10GB VRAM with TeaCache):

cd HunyuanVideo-Avatar

JOBS_DIR=$(dirname $(dirname "$0"))
export PYTHONPATH=./

export MODEL_BASE=./weights
OUTPUT_BASEPATH=./results-poor

checkpoint_path=${MODEL_BASE}/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt

export CPU_OFFLOAD=1
CUDA_VISIBLE_DEVICES=0 python3 hymm_sp/sample_gpu_poor.py \
    --input 'assets/test.csv' \
    --ckpt ${checkpoint_path} \
    --sample-n-frames 129 \
    --seed 128 \
    --image-size 704 \
    --cfg-scale 7.5 \
    --infer-steps 50 \
    --use-deepcache 1 \
    --flow-shift-eval-video 5.0 \
    --save-path ${OUTPUT_BASEPATH} \
    --use-fp8 \
    --cpu-offload \
    --infer-min

Run a Gradio Server:

cd HunyuanVideo-Avatar

bash ./scripts/run_gradio.sh

Why Use HunyuanVideo-Avatar?

HunyuanVideo-Avatar offers a robust solution for generating high-quality, dynamic human animations from audio input. Its key advantages include:

  • High-Fidelity and Dynamic Video Generation: Produces realistic and natural videos with high-dynamic foreground and background, preserving strong character consistency.
  • Emotion-Controllable Animation: Allows precise control over facial emotions, driven by input audio and emotion reference images.
  • Multi-Character Support: Capable of animating multiple characters simultaneously, making it ideal for dialogue videos and complex scene creation.
  • Versatile Avatar Styles: Supports a wide range of avatar images, including photorealistic, cartoon, 3D-rendered, and anthropomorphic characters, at arbitrary scales and resolutions (portrait, upper-body, full-body).
  • Optimized for Performance: Includes features like DeepCache and TeaCache for efficient inference, even on single GPUs with limited VRAM (as low as 10GB).
  • Broad Applications: Suitable for various downstream tasks such as e-commerce, online streaming, social media video production, and general video content creation and editing.

Links

Related repositories

Similar repositories that may be relevant next.

LazyLLM: Low-Code Development for Multi-Agent LLM Applications

LazyLLM: Low-Code Development for Multi-Agent LLM Applications

July 2, 2026

LazyLLM offers a low-code development tool designed for building multi-agent LLM applications with ease. It simplifies the creation of complex AI applications, providing a streamlined workflow for rapid prototyping, data feedback, and iterative optimization. Developers can leverage its extensive features for deployment, cross-platform compatibility, and efficient model fine-tuning.

PythonAI DevelopmentMulti-Agent
ChatArena: Multi-Agent Language Game Environments for LLMs

ChatArena: Multi-Agent Language Game Environments for LLMs

July 1, 2026

ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.

AILarge Language ModelsMulti-Agent Systems
Agentarium: A Python Framework for AI Agent Simulations

Agentarium: A Python Framework for AI Agent Simulations

July 1, 2026

Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.

PythonAIAgents
Lighteval: Your All-in-One Toolkit for LLM Evaluation

Lighteval: Your All-in-One Toolkit for LLM Evaluation

July 1, 2026

Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.

evaluationevaluation-frameworkevaluation-metrics

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️