Repository History

41 repositories tagged with Deep Learning

Topic: Deep Learning
GLM-5: Flagship Models for Long-Horizon Agentic Engineering

GLM-5: Flagship Models for Long-Horizon Agentic Engineering

GLM-5 is a series of flagship models, including GLM-5.2, GLM-5.1, and GLM-5, developed by zai-org for complex systems engineering and long-horizon agentic tasks. These models offer advanced coding capabilities, impressive context lengths, and state-of-the-art performance on various benchmarks. They are designed to sustain effective problem-solving over extended sessions through iterative reasoning and strategy revision.

Analyzed Jun 18, 2026
View Details
Qwen3-VL: A Powerful Multimodal Large Language Model Series

Qwen3-VL: A Powerful Multimodal Large Language Model Series

Qwen3-VL is a cutting-edge multimodal large language model series from Alibaba Cloud's Qwen team. It offers significant advancements in visual and text understanding, extended context length, and enhanced agent capabilities. This model is designed for flexible deployment, scaling from edge to cloud.

Analyzed Jun 15, 2026
View Details
autoresearch: AI Agents for Autonomous LLM Training Research

autoresearch: AI Agents for Autonomous LLM Training Research

autoresearch, by Andrej Karpathy, pioneers autonomous AI research by enabling agents to experiment with LLM training on a single GPU. The system allows an AI agent to modify code, train a model for a fixed 5-minute duration, and iteratively optimize for improved performance. This innovative approach aims to automate the experimental cycle of AI research, fostering continuous discovery and optimization.

Analyzed May 31, 2026
View Details
GLM-OCR: Accurate, Fast, and Comprehensive Multimodal OCR Model

GLM-OCR: Accurate, Fast, and Comprehensive Multimodal OCR Model

GLM-OCR is a powerful multimodal OCR model designed for complex document understanding, built on the GLM-V encoder-decoder architecture. It achieves state-of-the-art performance across various benchmarks, offering efficient inference and easy integration. This open-source solution is optimized for real-world business scenarios, providing robust and high-quality OCR capabilities.

Analyzed May 28, 2026
View Details
AI Engineering from Scratch: A Comprehensive Hands-On AI Curriculum

AI Engineering from Scratch: A Comprehensive Hands-On AI Curriculum

The "AI Engineering from Scratch" repository provides a free, MIT-licensed curriculum for mastering AI engineering from foundational math to advanced agent systems. It emphasizes a hands-on approach, guiding learners to build every algorithm from scratch before utilizing frameworks. With 435 lessons across 20 phases, this project equips students with the practical skills needed to professionally build and deploy AI solutions.

Analyzed May 25, 2026
View Details
Qwen3: Alibaba Cloud's Advanced Large Language Model Series

Qwen3: Alibaba Cloud's Advanced Large Language Model Series

Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.

Analyzed May 10, 2026
View Details
AI-Scientist-v2: Automated Scientific Discovery via Agentic Tree Search

AI-Scientist-v2: Automated Scientific Discovery via Agentic Tree Search

AI-Scientist-v2 is an advanced agentic system designed for automated scientific discovery, capable of generating hypotheses, running experiments, analyzing data, and writing scientific manuscripts. This system has successfully produced the first workshop paper written entirely by AI and accepted through peer review, marking a significant step towards fully autonomous research.

Analyzed May 9, 2026
View Details
PyTorch Image Models (timm): The Ultimate Collection of Image Encoders

PyTorch Image Models (timm): The Ultimate Collection of Image Encoders

PyTorch Image Models (timm) is an extensive library offering the largest collection of PyTorch image encoders and backbones. It provides a wide array of state-of-the-art models, complete with pretrained weights, training, evaluation, and inference scripts. This makes it an invaluable resource for researchers and developers working with computer vision tasks in PyTorch.

Analyzed May 5, 2026
View Details
Kapre: Keras Audio Preprocessors for Real-time GPU Processing

Kapre: Keras Audio Preprocessors for Real-time GPU Processing

Kapre is a powerful Python library that provides Keras layers for real-time audio preprocessing directly on GPUs. It enables efficient computation of STFT, Melspectrograms, and other audio features within your deep learning models. This integration simplifies model deployment, allows for DSP parameter optimization, and ensures consistency compared to traditional pre-computation or custom implementations.

Analyzed May 3, 2026
View Details
Kimi-k1.5: Scaling Reinforcement Learning with LLMs and Multimodality

Kimi-k1.5: Scaling Reinforcement Learning with LLMs and Multimodality

Kimi-k1.5 introduces an o1-level multi-modal model that significantly advances reinforcement learning with Large Language Models. It demonstrates state-of-the-art performance in short-CoT tasks, outperforming leading models like GPT-4o and Claude Sonnet 3.5, and matches o1 performance in long-CoT scenarios across various modalities. This project highlights key innovations in long context scaling and improved policy optimization.

Analyzed Apr 17, 2026
View Details
CoTracker: A Powerful Model for Tracking Any Point on a Video

CoTracker: A Powerful Model for Tracking Any Point on a Video

CoTracker is a state-of-the-art model developed by Facebook AI Research and the University of Oxford, designed for tracking any point (pixel) across video sequences. This transformer-based solution offers fast, accurate, and quasi-dense point tracking capabilities. It is an invaluable tool for researchers and developers in computer vision, enabling precise analysis of motion in videos.

Analyzed Apr 11, 2026
View Details
Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning

Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning

Spark-TTS is an advanced text-to-speech system that leverages large language models (LLM) for highly accurate and natural-sounding voice synthesis. Built on Qwen2.5, it offers streamlined efficiency, high-quality zero-shot voice cloning, bilingual support for Chinese and English, and controllable speech generation, making it versatile for both research and production.

Analyzed Apr 5, 2026
View Details
Previous Page 1 Next
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️