Repository History

Explore all analyzed open source repositories

Topic: Deep Learning

big_vision: Google Research's Codebase for Large-Scale Vision Models

big_vision is Google Research's official codebase for training large-scale vision models using Jax/Flax. It has been instrumental in developing prominent architectures like Vision Transformer, SigLIP, and MLP-Mixer. This repository offers a robust starting point for researchers to conduct scalable vision experiments on GPUs and Cloud TPUs, scaling seamlessly from single cores to distributed setups.

Dec 31, 2025

View Details

StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.

Dec 13, 2025

View Details

Picotron: Minimalistic 4D-Parallelism Framework for LLM Training Education

Picotron is a minimalistic and hackable distributed training framework designed for educational purposes. Inspired by NanoGPT, it focuses on pre-training Llama-like models using 4D Parallelism, making complex concepts accessible. Its simple and readable codebase, with core files under 300 lines, provides an excellent tool for learning and experimentation in distributed machine learning.

Dec 12, 2025

View Details

SyncTalk: High-Quality Talking Head Synthesis from CVPR 2024

SyncTalk is the official repository for a CVPR 2024 paper on talking head synthesis. This project focuses on generating highly synchronized lip movements, facial expressions, and stable head poses, while also restoring hair details for high-resolution video output. It leverages tri-plane hash representations to maintain subject identity effectively.

Dec 11, 2025

View Details

VGGT: Visual Geometry Grounded Transformer for Rapid 3D Scene Reconstruction

VGGT, the recipient of the CVPR 2025 Best Paper Award, is a Visual Geometry Grounded Transformer developed by Facebook AI and the Visual Geometry Group at Oxford. This innovative feed-forward neural network efficiently infers key 3D scene attributes, including camera parameters, depth maps, and 3D point tracks, from single or multiple images within seconds. It offers a powerful solution for rapid 3D reconstruction and scene understanding.

Dec 10, 2025

View Details

multiresolution-time-series-transformer: Long-term Forecasting with MTST

This repository provides a PyTorch implementation of the Multi-Resolution Time-Series Transformer (MTST) for long-term forecasting. Based on the Zhang et al. (2024) paper, MTST processes temporal data at different resolutions to effectively capture both short-term and long-term patterns. It offers a flexible and robust solution for advanced time series prediction tasks.

Nov 30, 2025

View Details

deepface: Lightweight Face Recognition and Facial Attribute Analysis Library

deepface is a powerful yet lightweight Python library for face recognition and facial attribute analysis. It offers capabilities for age, gender, emotion, and race prediction, wrapping state-of-the-art models for robust performance. Developers can easily integrate advanced facial analysis into their applications with just a few lines of code.

Nov 20, 2025

View Details

audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

Nov 20, 2025

View Details

mlx-examples: Practical Examples for the MLX Machine Learning Framework

mlx-examples is a comprehensive GitHub repository showcasing a variety of standalone examples built using the MLX framework. It provides practical implementations across text, image, audio, and multimodal models, serving as an excellent resource for developers exploring MLX. This collection helps users understand and apply MLX for diverse machine learning tasks.

Nov 18, 2025

View Details

InfiniteTalk: Unlimited-Length AI Video Generation from Audio or Images

InfiniteTalk is an innovative AI model for generating unlimited-length talking videos. It excels at creating realistic video content from audio, supporting both image-to-video and video-to-video generation. This framework ensures accurate lip synchronization and consistent identity preservation, aligning head movements, body posture, and facial expressions with the input audio.

Nov 13, 2025

View Details

Gradio: Build and Share Machine Learning Apps in Python

Gradio is an open-source Python library that simplifies the creation and sharing of interactive web applications for machine learning models, APIs, or any Python function. It allows developers to quickly build user interfaces without needing JavaScript, CSS, or web hosting expertise, offering a straightforward way to demo AI projects. With Gradio, you can transform your Python functions into shareable web demos in just a few lines of code.

Oct 31, 2025

View Details

LitServe: Build Custom Inference Engines for AI Models

LitServe is a powerful framework from Lightning AI designed to help developers build custom inference engines for a wide range of AI models and systems. It provides expert control over serving, supporting agents, multi-modal systems, RAG, and pipelines without the typical MLOps overhead. This framework offers a flexible and efficient solution for deploying AI models, whether self-hosted or managed on the Lightning AI platform.

Oct 29, 2025

View Details

Page 1