Repository History
41 repositories tagged with Deep Learning

GLM-5: Flagship Models for Long-Horizon Agentic Engineering
GLM-5 is a series of flagship models, including GLM-5.2, GLM-5.1, and GLM-5, developed by zai-org for complex systems engineering and long-horizon agentic tasks. These models offer advanced coding capabilities, impressive context lengths, and state-of-the-art performance on various benchmarks. They are designed to sustain effective problem-solving over extended sessions through iterative reasoning and strategy revision.

Qwen3-VL: A Powerful Multimodal Large Language Model Series
Qwen3-VL is a cutting-edge multimodal large language model series from Alibaba Cloud's Qwen team. It offers significant advancements in visual and text understanding, extended context length, and enhanced agent capabilities. This model is designed for flexible deployment, scaling from edge to cloud.

autoresearch: AI Agents for Autonomous LLM Training Research
autoresearch, by Andrej Karpathy, pioneers autonomous AI research by enabling agents to experiment with LLM training on a single GPU. The system allows an AI agent to modify code, train a model for a fixed 5-minute duration, and iteratively optimize for improved performance. This innovative approach aims to automate the experimental cycle of AI research, fostering continuous discovery and optimization.

GLM-OCR: Accurate, Fast, and Comprehensive Multimodal OCR Model
GLM-OCR is a powerful multimodal OCR model designed for complex document understanding, built on the GLM-V encoder-decoder architecture. It achieves state-of-the-art performance across various benchmarks, offering efficient inference and easy integration. This open-source solution is optimized for real-world business scenarios, providing robust and high-quality OCR capabilities.
AI Engineering from Scratch: A Comprehensive Hands-On AI Curriculum
The "AI Engineering from Scratch" repository provides a free, MIT-licensed curriculum for mastering AI engineering from foundational math to advanced agent systems. It emphasizes a hands-on approach, guiding learners to build every algorithm from scratch before utilizing frameworks. With 435 lessons across 20 phases, this project equips students with the practical skills needed to professionally build and deploy AI solutions.
Qwen3: Alibaba Cloud's Advanced Large Language Model Series
Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.

AI-Scientist-v2: Automated Scientific Discovery via Agentic Tree Search
AI-Scientist-v2 is an advanced agentic system designed for automated scientific discovery, capable of generating hypotheses, running experiments, analyzing data, and writing scientific manuscripts. This system has successfully produced the first workshop paper written entirely by AI and accepted through peer review, marking a significant step towards fully autonomous research.

PyTorch Image Models (timm): The Ultimate Collection of Image Encoders
PyTorch Image Models (timm) is an extensive library offering the largest collection of PyTorch image encoders and backbones. It provides a wide array of state-of-the-art models, complete with pretrained weights, training, evaluation, and inference scripts. This makes it an invaluable resource for researchers and developers working with computer vision tasks in PyTorch.

Kapre: Keras Audio Preprocessors for Real-time GPU Processing
Kapre is a powerful Python library that provides Keras layers for real-time audio preprocessing directly on GPUs. It enables efficient computation of STFT, Melspectrograms, and other audio features within your deep learning models. This integration simplifies model deployment, allows for DSP parameter optimization, and ensures consistency compared to traditional pre-computation or custom implementations.

Kimi-k1.5: Scaling Reinforcement Learning with LLMs and Multimodality
Kimi-k1.5 introduces an o1-level multi-modal model that significantly advances reinforcement learning with Large Language Models. It demonstrates state-of-the-art performance in short-CoT tasks, outperforming leading models like GPT-4o and Claude Sonnet 3.5, and matches o1 performance in long-CoT scenarios across various modalities. This project highlights key innovations in long context scaling and improved policy optimization.
CoTracker: A Powerful Model for Tracking Any Point on a Video
CoTracker is a state-of-the-art model developed by Facebook AI Research and the University of Oxford, designed for tracking any point (pixel) across video sequences. This transformer-based solution offers fast, accurate, and quasi-dense point tracking capabilities. It is an invaluable tool for researchers and developers in computer vision, enabling precise analysis of motion in videos.
Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning
Spark-TTS is an advanced text-to-speech system that leverages large language models (LLM) for highly accurate and natural-sounding voice synthesis. Built on Qwen2.5, it offers streamlined efficiency, high-quality zero-shot voice cloning, bilingual support for Chinese and English, and controllable speech generation, making it versatile for both research and production.