Repository History

Explore all analyzed open source repositories

Topic: Computer Vision

big_vision: Google Research's Codebase for Large-Scale Vision Models

big_vision is Google Research's official codebase for training large-scale vision models using Jax/Flax. It has been instrumental in developing prominent architectures like Vision Transformer, SigLIP, and MLP-Mixer. This repository offers a robust starting point for researchers to conduct scalable vision experiments on GPUs and Cloud TPUs, scaling seamlessly from single cores to distributed setups.

Dec 31, 2025

View Details

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation

HunyuanVideo-Avatar is a cutting-edge project by Tencent-Hunyuan for high-fidelity, audio-driven human animation. Utilizing a multimodal diffusion transformer, it generates dynamic, emotion-controllable, and multi-character dialogue videos. This innovative system addresses critical challenges in character consistency, emotion alignment, and multi-character animation, making it suitable for diverse applications like e-commerce and social media.

Dec 30, 2025

View Details

OmniParser: A Vision-Based Tool for GUI Agent Screen Parsing

OmniParser is a comprehensive tool developed by Microsoft for parsing user interface screenshots into structured, understandable elements. It significantly enhances the ability of vision-based models, such as GPT-4V, to generate accurate actions grounded in specific regions of a GUI. This project aims to advance pure vision-based GUI agents by providing robust screen parsing capabilities.

Dec 28, 2025

View Details

CineScale: Unlocking 4K High-Resolution Cinematic Video Generation

CineScale is an innovative GitHub repository by Eyeline-Labs, extending FreeScale to enable high-resolution cinematic video generation. It provides models and tools to achieve up to 4K video output, leveraging diffusion models for advanced visual content creation. This project offers a robust framework for researchers and developers to generate stunning, high-definition videos.

Dec 18, 2025

View Details

StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.

Dec 13, 2025

View Details

SyncTalk: High-Quality Talking Head Synthesis from CVPR 2024

SyncTalk is the official repository for a CVPR 2024 paper on talking head synthesis. This project focuses on generating highly synchronized lip movements, facial expressions, and stable head poses, while also restoring hair details for high-resolution video output. It leverages tri-plane hash representations to maintain subject identity effectively.

Dec 11, 2025

View Details

VGGT: Visual Geometry Grounded Transformer for Rapid 3D Scene Reconstruction

VGGT, the recipient of the CVPR 2025 Best Paper Award, is a Visual Geometry Grounded Transformer developed by Facebook AI and the Visual Geometry Group at Oxford. This innovative feed-forward neural network efficiently infers key 3D scene attributes, including camera parameters, depth maps, and 3D point tracks, from single or multiple images within seconds. It offers a powerful solution for rapid 3D reconstruction and scene understanding.

Dec 10, 2025

View Details

deepface: Lightweight Face Recognition and Facial Attribute Analysis Library

deepface is a powerful yet lightweight Python library for face recognition and facial attribute analysis. It offers capabilities for age, gender, emotion, and race prediction, wrapping state-of-the-art models for robust performance. Developers can easily integrate advanced facial analysis into their applications with just a few lines of code.

Nov 20, 2025

View Details

audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

Nov 20, 2025

View Details

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

LivePortrait is an official PyTorch implementation for efficient portrait animation, bringing still images and videos to life with advanced stitching and retargeting control. It supports both human and animal subjects, offering various features like image-driven mode, regional control, and precise editing. Widely adopted by major video platforms, LivePortrait provides a robust solution for generating dynamic animated portraits.

Oct 12, 2025

View Details

Leffa: Controllable Person Image Generation with Flow Fields in Attention

Leffa is a unified framework for controllable person image generation, enabling precise manipulation of appearance through virtual try-on and pose via pose transfer. This project addresses the common issue of fine-grained textural detail distortion by learning flow fields in attention, guiding target queries to correct reference keys. It achieves state-of-the-art performance, maintaining high image quality while significantly reducing detail distortion.

Oct 12, 2025

View Details

Page 1