Repository History

Explore all analyzed open source repositories

Topic: Deep Learning
Qwen3: Alibaba Cloud's Advanced Large Language Model Series

Qwen3: Alibaba Cloud's Advanced Large Language Model Series

Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.

May 10, 2026
View Details
AI-Scientist-v2: Automated Scientific Discovery via Agentic Tree Search

AI-Scientist-v2: Automated Scientific Discovery via Agentic Tree Search

AI-Scientist-v2 is an advanced agentic system designed for automated scientific discovery, capable of generating hypotheses, running experiments, analyzing data, and writing scientific manuscripts. This system has successfully produced the first workshop paper written entirely by AI and accepted through peer review, marking a significant step towards fully autonomous research.

May 9, 2026
View Details
PyTorch Image Models (timm): The Ultimate Collection of Image Encoders

PyTorch Image Models (timm): The Ultimate Collection of Image Encoders

PyTorch Image Models (timm) is an extensive library offering the largest collection of PyTorch image encoders and backbones. It provides a wide array of state-of-the-art models, complete with pretrained weights, training, evaluation, and inference scripts. This makes it an invaluable resource for researchers and developers working with computer vision tasks in PyTorch.

May 5, 2026
View Details
Kapre: Keras Audio Preprocessors for Real-time GPU Processing

Kapre: Keras Audio Preprocessors for Real-time GPU Processing

Kapre is a powerful Python library that provides Keras layers for real-time audio preprocessing directly on GPUs. It enables efficient computation of STFT, Melspectrograms, and other audio features within your deep learning models. This integration simplifies model deployment, allows for DSP parameter optimization, and ensures consistency compared to traditional pre-computation or custom implementations.

May 3, 2026
View Details
Kimi-k1.5: Scaling Reinforcement Learning with LLMs and Multimodality

Kimi-k1.5: Scaling Reinforcement Learning with LLMs and Multimodality

Kimi-k1.5 introduces an o1-level multi-modal model that significantly advances reinforcement learning with Large Language Models. It demonstrates state-of-the-art performance in short-CoT tasks, outperforming leading models like GPT-4o and Claude Sonnet 3.5, and matches o1 performance in long-CoT scenarios across various modalities. This project highlights key innovations in long context scaling and improved policy optimization.

Apr 17, 2026
View Details
CoTracker: A Powerful Model for Tracking Any Point on a Video

CoTracker: A Powerful Model for Tracking Any Point on a Video

CoTracker is a state-of-the-art model developed by Facebook AI Research and the University of Oxford, designed for tracking any point (pixel) across video sequences. This transformer-based solution offers fast, accurate, and quasi-dense point tracking capabilities. It is an invaluable tool for researchers and developers in computer vision, enabling precise analysis of motion in videos.

Apr 11, 2026
View Details
Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning

Spark-TTS: Efficient LLM-Based Text-to-Speech with Zero-Shot Voice Cloning

Spark-TTS is an advanced text-to-speech system that leverages large language models (LLM) for highly accurate and natural-sounding voice synthesis. Built on Qwen2.5, it offers streamlined efficiency, high-quality zero-shot voice cloning, bilingual support for Chinese and English, and controllable speech generation, making it versatile for both research and production.

Apr 5, 2026
View Details
AudioSep: Foundation Model for Open-Domain Sound Separation with Language Queries

AudioSep: Foundation Model for Open-Domain Sound Separation with Language Queries

AudioSep is a groundbreaking foundation model for open-domain sound separation, allowing users to isolate specific sounds using natural language descriptions. It demonstrates strong performance and impressive zero-shot generalization across various tasks, including audio event, musical instrument, and speech separation. This powerful tool simplifies complex audio processing with intuitive text-based queries.

Mar 30, 2026
View Details
JAX: Composable Transformations for Python+NumPy Programs

JAX: Composable Transformations for Python+NumPy Programs

JAX is a powerful Python library designed for high-performance numerical computing and large-scale machine learning. It offers composable function transformations like automatic differentiation, JIT compilation to accelerators (GPU/TPU), and auto-vectorization. This powerful combination allows developers to write flexible and efficient numerical programs.

Mar 26, 2026
View Details
Translation Agent: Agentic Translation with LLM Reflection Workflow

Translation Agent: Agentic Translation with LLM Reflection Workflow

Translation Agent is a Python demonstration of an agentic workflow for machine translation, leveraging large language models (LLMs) and a reflection process. This innovative approach aims to improve translation quality by having the LLM translate, reflect on its output, and then refine the translation based on its own suggestions. It offers significant customizability for style, idioms, and regional language variations, making it a promising direction for future translation technologies.

Mar 24, 2026
View Details
LLMBox: A Comprehensive Python Library for LLM Training and Evaluation

LLMBox: A Comprehensive Python Library for LLM Training and Evaluation

LLMBox is a comprehensive Python library designed for implementing Large Language Models, offering a unified training pipeline and extensive model evaluation capabilities. It provides a one-stop solution for both training and utilizing LLMs, emphasizing flexibility and efficiency. Developers can leverage its diverse training strategies and blazingly fast inference for their LLM projects.

Mar 16, 2026
View Details
WiFi-3D-Fusion: Real-Time 3D Human Pose Estimation from WiFi Signals

WiFi-3D-Fusion: Real-Time 3D Human Pose Estimation from WiFi Signals

WiFi-3D-Fusion is an innovative open-source research project that leverages WiFi CSI signals and deep learning to estimate 3D human pose. It uniquely fuses wireless sensing with computer vision techniques, providing next-generation spatial awareness. This project offers real-time motion detection and visualization, showcasing a novel approach to understanding human movement in 3D space.

Mar 15, 2026
View Details
Page 1