Repository History
3 repositories tagged with evaluation

LangWatch: The Platform for LLM Evaluations and AI Agent Testing
LangWatch is an open-source platform designed for end-to-end LLM evaluations and AI agent testing. It helps teams test, simulate, evaluate, and monitor LLM-powered agents both before release and in production. Built for robust regression testing, simulations, and production observability, LangWatch eliminates the need for custom tooling.

Promptfoo: LLM Evaluation and Red Teaming for AI Applications
Promptfoo is an open-source CLI and library designed for evaluating and red-teaming Large Language Model (LLM) applications. It enables developers to test prompts, agents, and RAGs, compare model performance, and secure AI apps through vulnerability scanning. With simple declarative configs and CI/CD integration, Promptfoo helps ship reliable and secure AI solutions.

Langsmith-sdk: Client SDK for LLM Debugging, Evaluation, and Monitoring
The Langsmith-sdk provides client SDKs for interacting with the LangSmith platform, enabling robust debugging, evaluation, and monitoring of language models and intelligent agents. It offers native integrations with both LangChain Python and LangChain JS, making it an essential tool for LLM application development.