Repository History

41 repositories tagged with Deep Learning

Topic: Deep Learning

parakeet-mlx: Nvidia's Parakeet ASR Models on Apple Silicon with MLX

parakeet-mlx is an open-source project that implements Nvidia's advanced Automatic Speech Recognition (ASR) Parakeet models for Apple Silicon, leveraging the MLX framework for optimized performance. This Python library offers both a command-line interface and a flexible Python API, enabling efficient transcription of audio files, including real-time streaming capabilities. It provides a powerful solution for developers and researchers working with speech processing on Apple hardware.

Analyzed Jan 29, 2026

View Details

CSM: A Conversational Speech Generation Model by SesameAILabs

CSM (Conversational Speech Model) is an advanced speech generation model from SesameAILabs, designed to create RVQ audio codes from text and audio inputs. It leverages a Llama backbone and a smaller audio decoder for Mimi audio codes, enabling high-quality, context-aware speech synthesis. The model is now natively available in Hugging Face Transformers, making it accessible for researchers and developers.

Analyzed Jan 11, 2026

View Details

Qwen3-Coder: Alibaba Cloud's Agentic Code LLM for Advanced Development

Qwen3-Coder is a powerful large language model series from Alibaba Cloud's Qwen team, specifically designed for agentic coding. It offers exceptional performance in coding and agentic tasks, boasting long-context capabilities and support for a vast array of programming languages. This model sets new state-of-the-art results among open models, comparable to leading commercial alternatives.

Analyzed Jan 3, 2026

View Details

big_vision: Google Research's Codebase for Large-Scale Vision Models

big_vision is Google Research's official codebase for training large-scale vision models using Jax/Flax. It has been instrumental in developing prominent architectures like Vision Transformer, SigLIP, and MLP-Mixer. This repository offers a robust starting point for researchers to conduct scalable vision experiments on GPUs and Cloud TPUs, scaling seamlessly from single cores to distributed setups.

Analyzed Dec 31, 2025

View Details

StreamDiffusion: Real-Time Interactive Generation with Diffusion Pipelines

StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation, significantly enhancing the performance of current diffusion-based image generation techniques. It offers a pipeline-level solution to achieve high-speed image and text-to-image generation, making interactive AI experiences more accessible. This project introduces several key features to optimize computational efficiency and GPU utilization.

Analyzed Dec 13, 2025

View Details

Picotron: Minimalistic 4D-Parallelism Framework for LLM Training Education

Picotron is a minimalistic and hackable distributed training framework designed for educational purposes. Inspired by NanoGPT, it focuses on pre-training Llama-like models using 4D Parallelism, making complex concepts accessible. Its simple and readable codebase, with core files under 300 lines, provides an excellent tool for learning and experimentation in distributed machine learning.

Analyzed Dec 12, 2025

View Details

SyncTalk: High-Quality Talking Head Synthesis from CVPR 2024

SyncTalk is the official repository for a CVPR 2024 paper on talking head synthesis. This project focuses on generating highly synchronized lip movements, facial expressions, and stable head poses, while also restoring hair details for high-resolution video output. It leverages tri-plane hash representations to maintain subject identity effectively.

Analyzed Dec 11, 2025

View Details

VGGT: Visual Geometry Grounded Transformer for Rapid 3D Scene Reconstruction

VGGT, the recipient of the CVPR 2025 Best Paper Award, is a Visual Geometry Grounded Transformer developed by Facebook AI and the Visual Geometry Group at Oxford. This innovative feed-forward neural network efficiently infers key 3D scene attributes, including camera parameters, depth maps, and 3D point tracks, from single or multiple images within seconds. It offers a powerful solution for rapid 3D reconstruction and scene understanding.

Analyzed Dec 10, 2025

View Details

multiresolution-time-series-transformer: Long-term Forecasting with MTST

This repository provides a PyTorch implementation of the Multi-Resolution Time-Series Transformer (MTST) for long-term forecasting. Based on the Zhang et al. (2024) paper, MTST processes temporal data at different resolutions to effectively capture both short-term and long-term patterns. It offers a flexible and robust solution for advanced time series prediction tasks.

Analyzed Nov 30, 2025

View Details

deepface: Lightweight Face Recognition and Facial Attribute Analysis Library

deepface is a powerful yet lightweight Python library for face recognition and facial attribute analysis. It offers capabilities for age, gender, emotion, and race prediction, wrapping state-of-the-art models for robust performance. Developers can easily integrate advanced facial analysis into their applications with just a few lines of code.

Analyzed Nov 20, 2025

View Details

audio2photoreal: Synthesizing Photorealistic Codec Avatars from Audio

audio2photoreal is a powerful GitHub repository from Facebook Research that provides code and a dataset for generating photorealistic Codec Avatars driven solely from audio input. This project enables the synthesis of human embodiment in conversations, offering tools for training, testing, and running pretrained models to create lifelike digital representations. It represents a significant advancement in AI-driven computer graphics and virtual reality.

Analyzed Nov 20, 2025

View Details

mlx-examples: Practical Examples for the MLX Machine Learning Framework

mlx-examples is a comprehensive GitHub repository showcasing a variety of standalone examples built using the MLX framework. It provides practical implementations across text, image, audio, and multimodal models, serving as an excellent resource for developers exploring MLX. This collection helps users understand and apply MLX for diverse machine learning tasks.

Analyzed Nov 18, 2025

View Details

Previous Page 3 Next