Promptfoo: LLM Evaluation and Red Teaming for AI Applications

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Promptfoo: LLM Evaluation and Red Teaming for AI Applications

Summary

Promptfoo is an open-source CLI and library designed for evaluating and red-teaming Large Language Model (LLM) applications. It enables developers to test prompts, agents, and RAGs, compare model performance, and secure AI apps through vulnerability scanning. With simple declarative configs and CI/CD integration, Promptfoo helps ship reliable and secure AI solutions.

Repository Information

Analyzed by OSRepos on March 24, 2026

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Promptfoo is a powerful open-source CLI and library for evaluating and red-teaming Large Language Model (LLM) applications. It helps developers move beyond trial-and-error, enabling them to ship secure and reliable AI apps. Used by organizations like OpenAI and Anthropic, Promptfoo provides a robust framework for testing prompts, agents, and RAGs, as well as comparing the performance of various LLMs such as GPT, Claude, Gemini, and Llama.

Installation

Getting started with Promptfoo is straightforward. You can install it globally via npm, brew, or pip:

npm install -g promptfoo

Alternatively, you can use brew install promptfoo or pip install promptfoo. For quick execution without installation, npx promptfoo@latest is also available.

Most LLM providers require an API key. Set yours as an environment variable:

export OPENAI_API_KEY=sk-abc123

Examples

Once installed, you can initialize an example project and run your first evaluation:

promptfoo init --example getting-started
cd getting-started
promptfoo eval
promptfoo view

Promptfoo offers a comprehensive suite of features to streamline your LLM development workflow. You can test prompts and models with automated evaluations, secure your LLM apps with red teaming and vulnerability scanning, and compare models side-by-side across various providers. It also supports automating checks in CI/CD and reviewing pull requests for LLM-related security issues. The tool provides visual web viewers for evaluation matrices, command-line output, and detailed security vulnerability reports.

Why use Promptfoo?

Promptfoo stands out for several key reasons:

  • Developer-first: It's fast, with features like live reload and caching, designed for developer efficiency.
  • Private: LLM evaluations run 100% locally, ensuring your prompts never leave your machine.
  • Flexible: It works seamlessly with any LLM API or programming language.
  • Battle-tested: Promptfoo powers LLM applications serving over 10 million users in production.
  • Data-driven: Make informed decisions based on concrete metrics, not just intuition.
  • Open source: It's MIT licensed, backed by an active and supportive community.

Links

Explore Promptfoo further with these official resources:

Related repositories

Similar repositories that may be relevant next.

Loop Engineering: Orchestrating AI Agents with Practical Patterns and Tools

Loop Engineering: Orchestrating AI Agents with Practical Patterns and Tools

June 25, 2026

Loop Engineering is a GitHub repository offering practical patterns, starters, and CLI tools for building robust AI coding agent systems. It shifts the focus from individual prompt crafting to designing control systems that orchestrate agents over time. This project empowers developers to create autonomous, iterative AI workflows for various development tasks.

agentic-aiai-agentsloop-engineering
MarkLLM: An Open-Source Toolkit for LLM Watermarking

MarkLLM: An Open-Source Toolkit for LLM Watermarking

June 23, 2026

MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

large-language-modelsllmsafety
GLM-5: Flagship Models for Long-Horizon Agentic Engineering

GLM-5: Flagship Models for Long-Horizon Agentic Engineering

June 18, 2026

GLM-5 is a series of flagship models, including GLM-5.2, GLM-5.1, and GLM-5, developed by zai-org for complex systems engineering and long-horizon agentic tasks. These models offer advanced coding capabilities, impressive context lengths, and state-of-the-art performance on various benchmarks. They are designed to sustain effective problem-solving over extended sessions through iterative reasoning and strategy revision.

agentic-aicodingllm
Deliberation: Multi-Agent LLM Consensus for Code and Plan Review

Deliberation: Multi-Agent LLM Consensus for Code and Plan Review

June 15, 2026

Deliberation is an innovative GitHub repository that enables Claude Code to leverage multiple LLMs like GPT, Gemini, Grok, and 400+ OpenRouter models for expert second opinions and arbiter-mediated consensus. It provides specialized AI agents for tasks such as code review, security analysis, and architectural design, ensuring comprehensive and reliable feedback. This project helps developers get diverse perspectives and achieve higher quality in their work.

ai-agentsllmmulti-agent

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️