Promptfoo: LLM Evaluation and Red Teaming for AI Applications

Summary
Promptfoo is an open-source CLI and library designed for evaluating and red-teaming Large Language Model (LLM) applications. It enables developers to test prompts, agents, and RAGs, compare model performance, and secure AI apps through vulnerability scanning. With simple declarative configs and CI/CD integration, Promptfoo helps ship reliable and secure AI solutions.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Promptfoo is a powerful open-source CLI and library for evaluating and red-teaming Large Language Model (LLM) applications. It helps developers move beyond trial-and-error, enabling them to ship secure and reliable AI apps. Used by organizations like OpenAI and Anthropic, Promptfoo provides a robust framework for testing prompts, agents, and RAGs, as well as comparing the performance of various LLMs such as GPT, Claude, Gemini, and Llama.
Installation
Getting started with Promptfoo is straightforward. You can install it globally via npm, brew, or pip:
npm install -g promptfoo
Alternatively, you can use brew install promptfoo or pip install promptfoo. For quick execution without installation, npx promptfoo@latest is also available.
Most LLM providers require an API key. Set yours as an environment variable:
export OPENAI_API_KEY=sk-abc123
Examples
Once installed, you can initialize an example project and run your first evaluation:
promptfoo init --example getting-started
cd getting-started
promptfoo eval
promptfoo view
Promptfoo offers a comprehensive suite of features to streamline your LLM development workflow. You can test prompts and models with automated evaluations, secure your LLM apps with red teaming and vulnerability scanning, and compare models side-by-side across various providers. It also supports automating checks in CI/CD and reviewing pull requests for LLM-related security issues. The tool provides visual web viewers for evaluation matrices, command-line output, and detailed security vulnerability reports.
Why use Promptfoo?
Promptfoo stands out for several key reasons:
- Developer-first: It's fast, with features like live reload and caching, designed for developer efficiency.
- Private: LLM evaluations run 100% locally, ensuring your prompts never leave your machine.
- Flexible: It works seamlessly with any LLM API or programming language.
- Battle-tested: Promptfoo powers LLM applications serving over 10 million users in production.
- Data-driven: Make informed decisions based on concrete metrics, not just intuition.
- Open source: It's MIT licensed, backed by an active and supportive community.
Links
Explore Promptfoo further with these official resources: