Recently analyzed

Fresh repository analysis from the OSRepos archive.

ChatArena: Multi-Agent Language Game Environments for LLMs

ChatArena: Multi-Agent Language Game Environments for LLMs

ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.

Analyzed Jul 1, 2026
View Details
Agentarium: A Python Framework for AI Agent Simulations

Agentarium: A Python Framework for AI Agent Simulations

Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.

Analyzed Jul 1, 2026
View Details
Lighteval: Your All-in-One Toolkit for LLM Evaluation

Lighteval: Your All-in-One Toolkit for LLM Evaluation

Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.

Analyzed Jul 1, 2026
View Details
PromptBench: A Unified Framework for LLM Evaluation and Robustness

PromptBench: A Unified Framework for LLM Evaluation and Robustness

PromptBench is a comprehensive Python library designed for the evaluation and understanding of Large Language Models (LLMs). It provides a unified framework for assessing model performance, exploring various prompt engineering techniques, and evaluating robustness against adversarial attacks. This tool empowers researchers to conduct in-depth analyses of LLMs across diverse datasets and models.

Analyzed Jul 1, 2026
View Details
LangTest: A Comprehensive Library for Safe & Effective Language Models

LangTest: A Comprehensive Library for Safe & Effective Language Models

LangTest is an open-source Python library dedicated to ensuring the safety and effectiveness of language models. It offers a comprehensive framework for testing model quality, covering robustness, bias, fairness, and accuracy across various NLP tasks and LLM providers. With LangTest, developers can generate and execute over 60 distinct test types with just one line of code, promoting responsible AI development.

Analyzed Jun 30, 2026
View Details
EvalPlus: Rigorous Evaluation for LLM-Synthesized Code

EvalPlus: Rigorous Evaluation for LLM-Synthesized Code

EvalPlus is a robust framework designed for the rigorous evaluation of code generated by Large Language Models (LLMs). It extends standard benchmarks like HumanEval and MBPP with significantly more tests, offering precise assessment of code correctness and efficiency. This tool is crucial for developers and researchers aiming to thoroughly validate LLM-synthesized code.

Analyzed Jun 30, 2026
View Details

Stay Updated

Get notified about new repositories and updates. Join our community of developers!

OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️