Repository History

3 repositories tagged with evaluation

Topic: evaluation

LangWatch: The Platform for LLM Evaluations and AI Agent Testing

LangWatch is an open-source platform designed for end-to-end LLM evaluations and AI agent testing. It helps teams test, simulate, evaluate, and monitor LLM-powered agents both before release and in production. Built for robust regression testing, simulations, and production observability, LangWatch eliminates the need for custom tooling.

Analyzed Apr 28, 2026

View Details

Promptfoo: LLM Evaluation and Red Teaming for AI Applications

Promptfoo is an open-source CLI and library designed for evaluating and red-teaming Large Language Model (LLM) applications. It enables developers to test prompts, agents, and RAGs, compare model performance, and secure AI apps through vulnerability scanning. With simple declarative configs and CI/CD integration, Promptfoo helps ship reliable and secure AI solutions.

Analyzed Mar 24, 2026

View Details

Langsmith-sdk: Client SDK for LLM Debugging, Evaluation, and Monitoring

The Langsmith-sdk provides client SDKs for interacting with the LangSmith platform, enabling robust debugging, evaluation, and monitoring of language models and intelligent agents. It offers native integrations with both LangChain Python and LangChain JS, making it an essential tool for LLM application development.

Analyzed Mar 18, 2026

View Details

Previous Page 1 Next