Repository History
Explore all analyzed open source repositories

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper2Code is an innovative multi-agent LLM system designed to automate the generation of code repositories directly from scientific papers in machine learning. It employs a sophisticated three-stage pipeline, encompassing planning, analysis, and code generation, each managed by specialized agents. This approach ensures faithful and high-quality implementations, outperforming existing baselines on relevant benchmarks.

context-engineering-intro: Master AI Coding Assistants with Context Engineering
Context Engineering represents a powerful evolution beyond traditional prompt engineering, focusing on providing comprehensive information to AI coding assistants for end-to-end task completion. The coleam00/context-engineering-intro repository offers a robust template and step-by-step guide to implement this discipline effectively. It enables developers to leverage AI, particularly with tools like Claude Code, to build complex features with greater consistency and fewer failures.

OmniParser: A Vision-Based Tool for GUI Agent Screen Parsing
OmniParser is a comprehensive tool developed by Microsoft for parsing user interface screenshots into structured, understandable elements. It significantly enhances the ability of vision-based models, such as GPT-4V, to generate accurate actions grounded in specific regions of a GUI. This project aims to advance pure vision-based GUI agents by providing robust screen parsing capabilities.
Memori: SQL Native Memory Layer for LLMs and AI Agents
Memori is an SQL Native Memory Layer designed for LLMs, AI Agents, and Multi-Agent Systems. It provides a robust and flexible solution for managing long-short term memory, integrating seamlessly with existing software and infrastructure. This project aims to enhance AI systems with persistent, structured memory capabilities, making them more intelligent and context-aware.
TextMachina: A Python Framework for MGT Dataset Generation
TextMachina is a modular and extensible Python framework designed for creating high-quality, unbiased datasets for Machine-Generated Text (MGT) tasks. It supports detection, attribution, and boundary detection, offering a user-friendly pipeline with LLM integrations, prompt templating, and bias mitigation. This tool streamlines the process of building robust models for understanding and identifying AI-generated content.

DeepScrape: Intelligent Web Scraping & LLM-Powered Data Extraction
DeepScrape is an AI-powered web scraping tool designed for intelligent data extraction using LLMs. It leverages Playwright for browser automation and supports both cloud (OpenAI) and local LLMs (Ollama, vLLM) for transforming web content into structured JSON. This versatile tool is ideal for modern web applications, RAG pipelines, and various data workflows, offering privacy-first data processing.

Airweave: Context Retrieval for AI Agents Across Apps and Databases
Airweave is an open-source context retrieval layer designed for AI agents, enabling them to access information across various applications and databases. It transforms diverse content into searchable knowledge bases, offering a standardized interface for agents to perform semantic, hybrid, and recency-biased searches. The platform simplifies data synchronization, entity extraction, and serves as a robust foundation for building intelligent AI applications.
Toolkit-for-Prompt-Compression: A Unified Toolkit for LLM Prompt Compression
PCToolkit is a unified, plug-and-play toolkit designed for efficient prompt compression in Large Language Models (LLMs). It provides state-of-the-art compression methods, diverse datasets, and comprehensive metrics for evaluating performance. This modular toolkit simplifies the process of condensing input prompts while preserving crucial information.

Picotron: Minimalistic 4D-Parallelism Framework for LLM Training Education
Picotron is a minimalistic and hackable distributed training framework designed for educational purposes. Inspired by NanoGPT, it focuses on pre-training Llama-like models using 4D Parallelism, making complex concepts accessible. Its simple and readable codebase, with core files under 300 lines, provides an excellent tool for learning and experimentation in distributed machine learning.

Model Context Protocol TypeScript SDK: Build MCP Servers and Clients
The `modelcontextprotocol/typescript-sdk` is the official TypeScript SDK for interacting with Model Context Protocol (MCP) servers and clients. It provides a standardized way for applications to offer context to Large Language Models (LLMs), separating context provision from LLM interaction. Developers can use it to easily create MCP servers that expose resources, prompts, and tools, as well as build MCP clients to connect to any MCP server.

Aider: AI Pair Programming in Your Terminal
Aider is an open-source project that brings AI pair programming directly to your terminal, enabling developers to collaborate with large language models (LLMs). It helps in building new projects or enhancing existing codebases efficiently. With robust features like codebase mapping, Git integration, and multi-language support, Aider is a versatile tool for modern development workflows.

Ant Design X: Crafting AI-Driven Interfaces with React Components
Ant Design X is an innovative open-source project by Ant Design, designed to simplify the creation of AI-driven user interfaces. It offers a comprehensive suite of enterprise-level LLM components, tools for efficient data stream management, and a high-performance Markdown renderer. Built with TypeScript, it leverages React and Ant Design to provide a robust solution for modern AI applications.