audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

This repository profile is provided by osrepos.com, an open source repository discovery platform.

audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

Summary

audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

Repository Information

Analyzed by OSRepos on November 20, 2025

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

The audio2photoreal repository from Facebook Research presents a groundbreaking project focused on synthesizing photorealistic Codec Avatars directly from audio input. This PyTorch implementation provides the code and dataset necessary to generate lifelike human embodiment in conversational settings. It's a significant step towards creating highly realistic digital representations of people, offering comprehensive tools for training, testing, and running pretrained models. Researchers and developers can leverage this work to explore advanced applications in computer graphics, AI, and virtual reality.

Installation

To get started with audio2photoreal, follow these steps for a quick setup and demo run. Ensure you have CUDA 11.7 and gcc/++ 9.0 for PyTorch3D compatibility.

First, create a Conda environment and install the necessary components, which include environment configuration, rendering assets, prerequisite models, and pretrained models:

conda create --name a2p_env python=3.9
conda activate a2p_env
sh demo/install.sh

Once the installation is complete, you can run the interactive demo:

python -m demo.demo

This demo allows you to record audio and then render corresponding photorealistic videos.

Examples

The audio2photoreal project enables the generation of photorealistic avatars from audio. You can experiment with the provided demo or delve into generating face and body movements separately.

A quick way to experience the project is through its interactive demo, where you can record an audio clip and generate a video of a photorealistic avatar speaking and moving in sync with your voice.

For more advanced usage, you can generate face codes and body poses independently using the pretrained models. For instance, to generate face codes for a participant like PXB184:

python -m sample.generate --model_path checkpoints/diffusion/c1_face/model000155000.pt --num_samples 10 --num_repetitions 5 --timestep_respacing ddim500 --guidance_param 10.0

After generating face codes, you can then generate body poses, optionally combining them for a full photorealistic avatar visualization:

python -m sample.generate --model_path checkpoints/diffusion/c1_pose/model000340000.pt --resume_trans checkpoints/guide/c1_pose/checkpoints/iter-0100000.pt --num_samples 10 --num_repetitions 5 --timestep_respacing ddim500 --guidance_param 2.0 --face_codes ./checkpoints/diffusion/c1_face/samples_c1_face_000155000_seed10_/results.npy --pose_codes ./checkpoints/diffusion/c1_pose/samples_c1_pose_000340000_seed10_guide_iter-0100000.pt/results.npy --plot

For an immediate hands-on experience without local setup, try the official Colab demo.

Why Use audio2photoreal?

audio2photoreal stands at the forefront of AI research in photorealistic avatar generation. By providing a robust framework to synthesize human embodiment from audio, it opens up numerous possibilities:

  • Cutting-edge Research: It offers a solid foundation for researchers in computer vision, graphics, and AI to build upon and advance the state-of-the-art in digital human creation.
  • Realistic Digital Humans: The project's ability to create highly convincing avatars driven by speech has implications for virtual assistants, realistic video conferencing, and immersive virtual reality experiences.
  • Comprehensive Toolkit: With train and test code, pretrained models, and access to a dataset, it provides a complete ecosystem for both experimentation and development.
  • Open-Source Contribution: As a Facebook Research project, it contributes valuable open-source resources to the community, fostering innovation in the field.

Links

Related repositories

Similar repositories that may be relevant next.

EvalPlus: Rigorous Evaluation for LLM-Synthesized Code

EvalPlus: Rigorous Evaluation for LLM-Synthesized Code

June 30, 2026

EvalPlus is a robust framework designed for the rigorous evaluation of code generated by Large Language Models (LLMs). It extends standard benchmarks like HumanEval and MBPP with significantly more tests, offering precise assessment of code correctness and efficiency. This tool is crucial for developers and researchers aiming to thoroughly validate LLM-synthesized code.

benchmarklarge-language-modelsprogram-synthesis
AgentEvals: Robust Evaluation Tools for LLM Agent Trajectories

AgentEvals: Robust Evaluation Tools for LLM Agent Trajectories

June 30, 2026

AgentEvals is a powerful open-source package from LangChain designed to simplify the evaluation of agentic applications. It provides a collection of ready-made evaluators and utilities, with a particular focus on analyzing agent trajectories, the intermediate steps an agent takes to solve problems. This helps developers understand and improve the reliability and performance of their LLM agents.

PythonLLMAgents
Phoenix: AI Observability and Evaluation Platform for LLMs

Phoenix: AI Observability and Evaluation Platform for LLMs

June 28, 2026

Phoenix is an open-source AI observability platform from Arize AI, designed for comprehensive experimentation, evaluation, and troubleshooting of LLM applications. It provides robust features including OpenTelemetry-based tracing, LLM evaluation, and systematic prompt management. This platform helps developers optimize and debug their AI models effectively across various environments.

AI ObservabilityLLM EvaluationPrompt Engineering
Observers: A Lightweight Library for AI Observability in Python

Observers: A Lightweight Library for AI Observability in Python

June 28, 2026

Observers is a Python library designed for AI observability, enabling developers to track and store interactions with generative AI APIs. It provides a flexible framework with various observers for popular LLM providers and multiple storage backends. This tool helps in monitoring, debugging, and analyzing AI model behavior effectively.

PythonAI ObservabilityLLM

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️