Promptfoo: LLM Evaluation and Red Teaming for AI Applications

Summary

Promptfoo is an open-source CLI and library designed for evaluating and red-teaming Large Language Model (LLM) applications. It enables developers to test prompts, agents, and RAGs, compare model performance, and secure AI apps through vulnerability scanning. With simple declarative configs and CI/CD integration, Promptfoo helps ship reliable and secure AI solutions.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Promptfoo is a powerful open-source CLI and library for evaluating and red-teaming Large Language Model (LLM) applications. It helps developers move beyond trial-and-error, enabling them to ship secure and reliable AI apps. Used by organizations like OpenAI and Anthropic, Promptfoo provides a robust framework for testing prompts, agents, and RAGs, as well as comparing the performance of various LLMs such as GPT, Claude, Gemini, and Llama.

Installation

Getting started with Promptfoo is straightforward. You can install it globally via npm, brew, or pip:

npm install -g promptfoo

Alternatively, you can use brew install promptfoo or pip install promptfoo. For quick execution without installation, npx promptfoo@latest is also available.

Most LLM providers require an API key. Set yours as an environment variable:

export OPENAI_API_KEY=sk-abc123

Examples

Once installed, you can initialize an example project and run your first evaluation:

promptfoo init --example getting-started
cd getting-started
promptfoo eval
promptfoo view

Promptfoo offers a comprehensive suite of features to streamline your LLM development workflow. You can test prompts and models with automated evaluations, secure your LLM apps with red teaming and vulnerability scanning, and compare models side-by-side across various providers. It also supports automating checks in CI/CD and reviewing pull requests for LLM-related security issues. The tool provides visual web viewers for evaluation matrices, command-line output, and detailed security vulnerability reports.

Why use Promptfoo?

Promptfoo stands out for several key reasons:

Developer-first: It's fast, with features like live reload and caching, designed for developer efficiency.
Private: LLM evaluations run 100% locally, ensuring your prompts never leave your machine.
Flexible: It works seamlessly with any LLM API or programming language.
Battle-tested: Promptfoo powers LLM applications serving over 10 million users in production.
Data-driven: Make informed decisions based on concrete metrics, not just intuition.
Open source: It's MIT licensed, backed by an active and supportive community.