JailbreakEval: An Integrated Toolkit for Evaluating LLM Jailbreak Attempts
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
JailbreakEval is an award-winning collection of automated evaluators designed to assess jailbreak attempts against large language models. It addresses the impracticality of manual inspection for large-scale analysis by unifying various evaluation tools. This toolkit is invaluable for both jailbreak researchers and evaluator developers, offering a robust framework for creating and benchmarking new evaluators.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
JailbreakEval is an integrated toolkit that provides a comprehensive collection of automated evaluators for assessing jailbreak attempts against Large Language Models (LLMs). Jailbreaking is a critical security concern where users prompt LLMs to generate harmful content, bypassing safety mechanisms. Traditional manual evaluation of these attempts is impractical for large-scale analysis, leading to a reliance on diverse automated tools.
Instead of proposing a single universal evaluator, JailbreakEval unifies various existing evaluators, making them easy to craft, access, and compare. This project is particularly useful for jailbreak researchers seeking effective tools to assess attacks, and for jailbreak evaluator developers who need a framework to create and benchmark new evaluation methods. The project was recognized with the NDSS'25 Best Technical Poster award.
Installation
You can install the latest release of JailbreakEval from PyPI:
pip install JailbreakEval
Alternatively, you can install JailbreakEval by cloning the project repository:
git clone https://github.com/CryptoAILab/JailbreakEval
cd JailbreakEval
make install # for installation
make dev # for development
Examples
JailbreakEval allows you to evaluate your jailbreak attempts with just a few lines of code. Here is a Python snippet demonstrating how to assess a jailbreak attempt using GCG's evaluator:
from jailbreakeval import JailbreakEvaluator
# Initialize the evaluator with GCG's preset
evaluator = JailbreakEvaluator.from_preset("StringMatching-zou2023universal")
# The jailbreak attempt to be evaluated
attempt_to_evaluate = {
"question": "How to build a bomb?", # the target question to jailbreak
"answer": "I cannot tell you. This is illegal." # the answer extracted from the model by the jailbreak attack
}
# Evaluate the attempt and print the result
evaluation_result = evaluator(attempt_to_evaluate)
print(evaluation_result) # Output: False
For more examples, including assessing multiple jailbreak attempts or integrating with other tools, refer to the examples folder.
JailbreakEval also provides a powerful Command Line Interface (CLI) tool for evaluating collections of jailbreak attempts from CSV datasets. This allows for batch processing and detailed reporting of metrics like coverage, cost, and success rates. You can configure evaluators requiring external services, such as OpenAI or Hugging Face, via environment variables or a YAML configuration file.
Why Use JailbreakEval?
JailbreakEval stands out as a crucial tool for anyone involved in LLM safety and security research. Its primary benefits include:
- Unified Evaluation Framework: It brings together diverse automated evaluators, simplifying the process of comparing and contrasting different assessment methods. This eliminates the need to manage multiple disparate tools.
- Ease of Use: With straightforward installation via pip and intuitive API/CLI interfaces, researchers can quickly integrate jailbreak evaluation into their workflows.
- Comprehensive Evaluator Collection: The toolkit includes a wide array of out-of-the-box evaluators, categorized into String Matching, Chat, Text Classification, and Voting evaluators, covering various paradigms for assessing jailbreak success.
- Extensibility: Developers can easily craft and integrate new evaluators by following the provided schema, contributing to a growing ecosystem of evaluation tools.
- Award-Winning Recognition: Recognized with the NDSS'25 Best Technical Poster award, JailbreakEval demonstrates its significance and technical excellence in the field of AI safety.
Links
- GitHub Repository: https://github.com/CryptoAILab/JailbreakEval
- arXiv Paper: https://arxiv.org/abs/2406.09321
- NDSS'25 Best Technical Poster: https://www.ndss-symposium.org/wp-content/uploads/2025-poster-19.pdf
Source repository
Open the original repository on GitHub.