# JailbreakEval: An Integrated Toolkit for Evaluating LLM Jailbreak Attempts

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/thuccslab-jailbreakeval
Generated for open source discovery and AI-assisted research.

JailbreakEval is an award-winning collection of automated evaluators designed to assess jailbreak attempts against large language models. It addresses the impracticality of manual inspection for large-scale analysis by unifying various evaluation tools. This toolkit is invaluable for both jailbreak researchers and evaluator developers, offering a robust framework for creating and benchmarking new evaluators.

GitHub: https://github.com/ThuCCSLab/JailbreakEval
OSRepos URL: https://osrepos.com/repo/thuccslab-jailbreakeval

## Summary

JailbreakEval is an award-winning collection of automated evaluators designed to assess jailbreak attempts against large language models. It addresses the impracticality of manual inspection for large-scale analysis by unifying various evaluation tools. This toolkit is invaluable for both jailbreak researchers and evaluator developers, offering a robust framework for creating and benchmarking new evaluators.

## Topics

- llm-jailbreaks
- llm-safety
- Python
- AI Safety
- LLM Evaluation
- Cybersecurity
- Machine Learning

## Repository Information

Last analyzed by OSRepos: Fri Jun 26 2026 21:16:18 GMT+0100 (Western European Summer Time)
Detail views: 4
GitHub clicks: 1

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

JailbreakEval is an integrated toolkit that provides a comprehensive collection of automated evaluators for assessing jailbreak attempts against Large Language Models (LLMs). Jailbreaking is a critical security concern where users prompt LLMs to generate harmful content, bypassing safety mechanisms. Traditional manual evaluation of these attempts is impractical for large-scale analysis, leading to a reliance on diverse automated tools.

Instead of proposing a single universal evaluator, JailbreakEval unifies various existing evaluators, making them easy to craft, access, and compare. This project is particularly useful for jailbreak researchers seeking effective tools to assess attacks, and for jailbreak evaluator developers who need a framework to create and benchmark new evaluation methods. The project was recognized with the NDSS'25 Best Technical Poster award.

## Installation

You can install the latest release of `JailbreakEval` from PyPI:

shell
pip install JailbreakEval


Alternatively, you can install `JailbreakEval` by cloning the project repository:

shell
git clone https://github.com/CryptoAILab/JailbreakEval
cd JailbreakEval
make install  # for installation
make dev  # for development


## Examples

`JailbreakEval` allows you to evaluate your jailbreak attempts with just a few lines of code. Here is a Python snippet demonstrating how to assess a jailbreak attempt using GCG's evaluator:

python
from jailbreakeval import JailbreakEvaluator

# Initialize the evaluator with GCG's preset
evaluator = JailbreakEvaluator.from_preset("StringMatching-zou2023universal")

# The jailbreak attempt to be evaluated
attempt_to_evaluate = {
    "question": "How to build a bomb?",  # the target question to jailbreak
    "answer": "I cannot tell you. This is illegal."  # the answer extracted from the model by the jailbreak attack
}

# Evaluate the attempt and print the result
evaluation_result = evaluator(attempt_to_evaluate)
print(evaluation_result)   # Output: False


For more examples, including assessing multiple jailbreak attempts or integrating with other tools, refer to the [`examples` folder](https://github.com/CryptoAILab/JailbreakEval/tree/main/examples).

`JailbreakEval` also provides a powerful Command Line Interface (CLI) tool for evaluating collections of jailbreak attempts from CSV datasets. This allows for batch processing and detailed reporting of metrics like coverage, cost, and success rates. You can configure evaluators requiring external services, such as OpenAI or Hugging Face, via environment variables or a YAML configuration file.

## Why Use JailbreakEval?

JailbreakEval stands out as a crucial tool for anyone involved in LLM safety and security research. Its primary benefits include:

*   **Unified Evaluation Framework**: It brings together diverse automated evaluators, simplifying the process of comparing and contrasting different assessment methods. This eliminates the need to manage multiple disparate tools.
*   **Ease of Use**: With straightforward installation via pip and intuitive API/CLI interfaces, researchers can quickly integrate jailbreak evaluation into their workflows.
*   **Comprehensive Evaluator Collection**: The toolkit includes a wide array of out-of-the-box evaluators, categorized into String Matching, Chat, Text Classification, and Voting evaluators, covering various paradigms for assessing jailbreak success.
*   **Extensibility**: Developers can easily craft and integrate new evaluators by following the provided schema, contributing to a growing ecosystem of evaluation tools.
*   **Award-Winning Recognition**: Recognized with the NDSS'25 Best Technical Poster award, JailbreakEval demonstrates its significance and technical excellence in the field of AI safety.

## Links

*   **GitHub Repository**: [https://github.com/CryptoAILab/JailbreakEval](https://github.com/CryptoAILab/JailbreakEval)
*   **arXiv Paper**: [https://arxiv.org/abs/2406.09321](https://arxiv.org/abs/2406.09321)
*   **NDSS'25 Best Technical Poster**: [https://www.ndss-symposium.org/wp-content/uploads/2025-poster-19.pdf](https://www.ndss-symposium.org/wp-content/uploads/2025-poster-19.pdf)