Repository History

Explore all analyzed open source repositories

Helium: A Privacy-First, Chromium-Based Web Browser

Helium is an open-source, Chromium-based web browser designed with a strong focus on user privacy and an unbiased ad-blocking experience. It aims to provide a fast, honest, and bloat-free browsing environment for users. Built upon ungoogled-chromium, it offers a secure alternative for daily web navigation.

Analyzed Jul 1, 2026

View Details

ChatArena: Multi-Agent Language Game Environments for LLMs

ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.

Analyzed Jul 1, 2026

View Details

Agentarium: A Python Framework for AI Agent Simulations

Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.

Analyzed Jul 1, 2026

View Details

Lighteval: Your All-in-One Toolkit for LLM Evaluation

Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.

Analyzed Jul 1, 2026

View Details

PromptBench: A Unified Framework for LLM Evaluation and Robustness

PromptBench is a comprehensive Python library designed for the evaluation and understanding of Large Language Models (LLMs). It provides a unified framework for assessing model performance, exploring various prompt engineering techniques, and evaluating robustness against adversarial attacks. This tool empowers researchers to conduct in-depth analyses of LLMs across diverse datasets and models.

Analyzed Jul 1, 2026

View Details

LangTest: A Comprehensive Library for Safe & Effective Language Models

LangTest is an open-source Python library dedicated to ensuring the safety and effectiveness of language models. It offers a comprehensive framework for testing model quality, covering robustness, bias, fairness, and accuracy across various NLP tasks and LLM providers. With LangTest, developers can generate and execute over 60 distinct test types with just one line of code, promoting responsible AI development.

Analyzed Jun 30, 2026

View Details

EvalPlus: Rigorous Evaluation for LLM-Synthesized Code

EvalPlus is a robust framework designed for the rigorous evaluation of code generated by Large Language Models (LLMs). It extends standard benchmarks like HumanEval and MBPP with significantly more tests, offering precise assessment of code correctness and efficiency. This tool is crucial for developers and researchers aiming to thoroughly validate LLM-synthesized code.

Analyzed Jun 30, 2026

View Details

AgentEvals: Robust Evaluation Tools for LLM Agent Trajectories

AgentEvals is a powerful open-source package from LangChain designed to simplify the evaluation of agentic applications. It provides a collection of ready-made evaluators and utilities, with a particular focus on analyzing agent trajectories, the intermediate steps an agent takes to solve problems. This helps developers understand and improve the reliability and performance of their LLM agents.

Analyzed Jun 30, 2026

View Details

Evidently: Open-Source ML and LLM Observability Framework

Evidently is an open-source Python library designed for evaluating, testing, and monitoring machine learning and large language model systems. It provides over 100 built-in metrics for various tasks, from data drift detection to LLM judges, supporting both tabular and text data. This framework helps ensure the quality and performance of AI-powered systems throughout their lifecycle.

Analyzed Jun 30, 2026

View Details

no-mistakes: AI-Driven Git Proxy for Flawless Pull Requests

no-mistakes is an innovative Git proxy that streamlines the pull request workflow by ensuring code quality before it reaches your remote. It uses an AI-driven validation pipeline in a disposable worktree, automatically applying safe fixes and escalating complex issues for human review. This tool helps developers maintain clean, high-quality codebases and open perfect PRs effortlessly.

Analyzed Jun 30, 2026

View Details

OpenMontage: The First Open-Source, Agentic Video Production System

OpenMontage is the world's first open-source, agentic video production system, designed to transform your AI coding assistant into a full video production studio. It features 12 pipelines, 52 tools, and over 500 agent skills, enabling end-to-end video creation from a simple prompt. This powerful tool handles research, scripting, asset generation, editing, and final composition, including the unique ability to produce real video from stock footage.

Analyzed Jun 29, 2026

View Details

builderio-agent-native

Analyzed Jun 29, 2026

View Details

Previous Page 1 Next