# audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/facebookresearch-audio2photoreal
Generated for open source discovery and AI-assisted research.

audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

GitHub: https://github.com/facebookresearch/audio2photoreal
OSRepos URL: https://osrepos.com/repo/facebookresearch-audio2photoreal

## Summary

audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

## Topics

- Python
- AI
- Computer Vision
- Generative AI
- Photorealistic Avatars
- Speech Synthesis
- Deep Learning

## Repository Information

Last analyzed by OSRepos: Thu Nov 20 2025 12:01:02 GMT+0000 (Western European Standard Time)
Detail views: 5
GitHub clicks: 2

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

The `audio2photoreal` repository from Facebook Research presents a groundbreaking project focused on synthesizing photorealistic Codec Avatars directly from audio input. This PyTorch implementation provides the code and dataset necessary to generate lifelike human embodiment in conversational settings. It's a significant step towards creating highly realistic digital representations of people, offering comprehensive tools for training, testing, and running pretrained models. Researchers and developers can leverage this work to explore advanced applications in computer graphics, AI, and virtual reality.

## Installation

To get started with `audio2photoreal`, follow these steps for a quick setup and demo run. Ensure you have CUDA 11.7 and gcc/++ 9.0 for PyTorch3D compatibility.

First, create a Conda environment and install the necessary components, which include environment configuration, rendering assets, prerequisite models, and pretrained models:

bash
conda create --name a2p_env python=3.9
conda activate a2p_env
sh demo/install.sh


Once the installation is complete, you can run the interactive demo:

bash
python -m demo.demo


This demo allows you to record audio and then render corresponding photorealistic videos.

## Examples

The `audio2photoreal` project enables the generation of photorealistic avatars from audio. You can experiment with the provided demo or delve into generating face and body movements separately.

A quick way to experience the project is through its interactive demo, where you can record an audio clip and generate a video of a photorealistic avatar speaking and moving in sync with your voice.

For more advanced usage, you can generate face codes and body poses independently using the pretrained models. For instance, to generate face codes for a participant like `PXB184`:

bash
python -m sample.generate --model_path checkpoints/diffusion/c1_face/model000155000.pt --num_samples 10 --num_repetitions 5 --timestep_respacing ddim500 --guidance_param 10.0


After generating face codes, you can then generate body poses, optionally combining them for a full photorealistic avatar visualization:

bash
python -m sample.generate --model_path checkpoints/diffusion/c1_pose/model000340000.pt --resume_trans checkpoints/guide/c1_pose/checkpoints/iter-0100000.pt --num_samples 10 --num_repetitions 5 --timestep_respacing ddim500 --guidance_param 2.0 --face_codes ./checkpoints/diffusion/c1_face/samples_c1_face_000155000_seed10_/results.npy --pose_codes ./checkpoints/diffusion/c1_pose/samples_c1_pose_000340000_seed10_guide_iter-0100000.pt/results.npy --plot


For an immediate hands-on experience without local setup, try the [official Colab demo](https://colab.research.google.com/drive/1A6WKM3PeX7dcKV66zxQWuP-v_dKlX_0?usp=sharing){:target="_blank"}.

## Why Use audio2photoreal?

`audio2photoreal` stands at the forefront of AI research in photorealistic avatar generation. By providing a robust framework to synthesize human embodiment from audio, it opens up numerous possibilities:

*   **Cutting-edge Research:** It offers a solid foundation for researchers in computer vision, graphics, and AI to build upon and advance the state-of-the-art in digital human creation.
*   **Realistic Digital Humans:** The project's ability to create highly convincing avatars driven by speech has implications for virtual assistants, realistic video conferencing, and immersive virtual reality experiences.
*   **Comprehensive Toolkit:** With train and test code, pretrained models, and access to a dataset, it provides a complete ecosystem for both experimentation and development.
*   **Open-Source Contribution:** As a Facebook Research project, it contributes valuable open-source resources to the community, fostering innovation in the field.

## Links

*   **GitHub Repository:** [https://github.com/facebookresearch/audio2photoreal](https://github.com/facebookresearch/audio2photoreal){:target="_blank"}
*   **Research Paper:** ["From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations"](https://arxiv.org/abs/2401.01885){:target="_blank"}
*   **Colab Demo:** [Try the interactive demo on Google Colab](https://colab.research.google.com/drive/1A6WKM3PeX7dcKV66zxQWuP-v_dKlX_0?usp=sharing){:target="_blank"}