Promptfoo: LLM Evaluation and Red Teaming for AI Applications
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
Promptfoo is an open-source CLI and library designed for evaluating and red-teaming Large Language Model (LLM) applications. It enables developers to test prompts, agents, and RAGs, compare model performance, and secure AI apps through vulnerability scanning. With simple declarative configs and CI/CD integration, Promptfoo helps ship reliable and secure AI solutions.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Promptfoo is a powerful open-source CLI and library for evaluating and red-teaming Large Language Model (LLM) applications. It helps developers move beyond trial-and-error, enabling them to ship secure and reliable AI apps. Used by organizations like OpenAI and Anthropic, Promptfoo provides a robust framework for testing prompts, agents, and RAGs, as well as comparing the performance of various LLMs such as GPT, Claude, Gemini, and Llama.
Installation
Getting started with Promptfoo is straightforward. You can install it globally via npm, brew, or pip:
npm install -g promptfoo
Alternatively, you can use brew install promptfoo or pip install promptfoo. For quick execution without installation, npx promptfoo@latest is also available.
Most LLM providers require an API key. Set yours as an environment variable:
export OPENAI_API_KEY=sk-abc123
Examples
Once installed, you can initialize an example project and run your first evaluation:
promptfoo init --example getting-started
cd getting-started
promptfoo eval
promptfoo view
Promptfoo offers a comprehensive suite of features to streamline your LLM development workflow. You can test prompts and models with automated evaluations, secure your LLM apps with red teaming and vulnerability scanning, and compare models side-by-side across various providers. It also supports automating checks in CI/CD and reviewing pull requests for LLM-related security issues. The tool provides visual web viewers for evaluation matrices, command-line output, and detailed security vulnerability reports.
Why use Promptfoo?
Promptfoo stands out for several key reasons:
- Developer-first: It's fast, with features like live reload and caching, designed for developer efficiency.
- Private: LLM evaluations run 100% locally, ensuring your prompts never leave your machine.
- Flexible: It works seamlessly with any LLM API or programming language.
- Battle-tested: Promptfoo powers LLM applications serving over 10 million users in production.
- Data-driven: Make informed decisions based on concrete metrics, not just intuition.
- Open source: It's MIT licensed, backed by an active and supportive community.
Links
Explore Promptfoo further with these official resources:
Related repositories
Similar repositories that may be relevant next.

Loop Engineering: Orchestrating AI Agents with Practical Patterns and Tools
June 25, 2026
Loop Engineering is a GitHub repository offering practical patterns, starters, and CLI tools for building robust AI coding agent systems. It shifts the focus from individual prompt crafting to designing control systems that orchestrate agents over time. This project empowers developers to create autonomous, iterative AI workflows for various development tasks.

MarkLLM: An Open-Source Toolkit for LLM Watermarking
June 23, 2026
MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

GLM-5: Flagship Models for Long-Horizon Agentic Engineering
June 18, 2026
GLM-5 is a series of flagship models, including GLM-5.2, GLM-5.1, and GLM-5, developed by zai-org for complex systems engineering and long-horizon agentic tasks. These models offer advanced coding capabilities, impressive context lengths, and state-of-the-art performance on various benchmarks. They are designed to sustain effective problem-solving over extended sessions through iterative reasoning and strategy revision.
Deliberation: Multi-Agent LLM Consensus for Code and Plan Review
June 15, 2026
Deliberation is an innovative GitHub repository that enables Claude Code to leverage multiple LLMs like GPT, Gemini, Grok, and 400+ OpenRouter models for expert second opinions and arbiter-mediated consensus. It provides specialized AI agents for tasks such as code review, security analysis, and architectural design, ensuring comprehensive and reliable feedback. This project helps developers get diverse perspectives and achieve higher quality in their work.
Source repository
Open the original repository on GitHub.