{"name":"JailbreakEval: An Integrated Toolkit for Evaluating LLM Jailbreak Attempts","description":"JailbreakEval is an award-winning collection of automated evaluators designed to assess jailbreak attempts against large language models. It addresses the impracticality of manual inspection for large-scale analysis by unifying various evaluation tools. This toolkit is invaluable for both jailbreak researchers and evaluator developers, offering a robust framework for creating and benchmarking new evaluators.","github":"https://github.com/ThuCCSLab/JailbreakEval","url":"https://osrepos.com/repo/thuccslab-jailbreakeval","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/thuccslab-jailbreakeval","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/thuccslab-jailbreakeval.md","json":"https://osrepos.com/repo/thuccslab-jailbreakeval.json","topics":["llm-jailbreaks","llm-safety","Python","AI Safety","LLM Evaluation","Cybersecurity","Machine Learning"],"keywords":["llm-jailbreaks","llm-safety","Python","AI Safety","LLM Evaluation","Cybersecurity","Machine Learning"],"stars":null,"summary":"JailbreakEval is an award-winning collection of automated evaluators designed to assess jailbreak attempts against large language models. It addresses the impracticality of manual inspection for large-scale analysis by unifying various evaluation tools. This toolkit is invaluable for both jailbreak researchers and evaluator developers, offering a robust framework for creating and benchmarking new evaluators.","content":"## Introduction\n\nJailbreakEval is an integrated toolkit that provides a comprehensive collection of automated evaluators for assessing jailbreak attempts against Large Language Models (LLMs). Jailbreaking is a critical security concern where users prompt LLMs to generate harmful content, bypassing safety mechanisms. Traditional manual evaluation of these attempts is impractical for large-scale analysis, leading to a reliance on diverse automated tools.\n\nInstead of proposing a single universal evaluator, JailbreakEval unifies various existing evaluators, making them easy to craft, access, and compare. This project is particularly useful for jailbreak researchers seeking effective tools to assess attacks, and for jailbreak evaluator developers who need a framework to create and benchmark new evaluation methods. The project was recognized with the NDSS'25 Best Technical Poster award.\n\n## Installation\n\nYou can install the latest release of `JailbreakEval` from PyPI:\n\nshell\npip install JailbreakEval\n\n\nAlternatively, you can install `JailbreakEval` by cloning the project repository:\n\nshell\ngit clone https://github.com/CryptoAILab/JailbreakEval\ncd JailbreakEval\nmake install  # for installation\nmake dev  # for development\n\n\n## Examples\n\n`JailbreakEval` allows you to evaluate your jailbreak attempts with just a few lines of code. Here is a Python snippet demonstrating how to assess a jailbreak attempt using GCG's evaluator:\n\npython\nfrom jailbreakeval import JailbreakEvaluator\n\n# Initialize the evaluator with GCG's preset\nevaluator = JailbreakEvaluator.from_preset(\"StringMatching-zou2023universal\")\n\n# The jailbreak attempt to be evaluated\nattempt_to_evaluate = {\n    \"question\": \"How to build a bomb?\",  # the target question to jailbreak\n    \"answer\": \"I cannot tell you. This is illegal.\"  # the answer extracted from the model by the jailbreak attack\n}\n\n# Evaluate the attempt and print the result\nevaluation_result = evaluator(attempt_to_evaluate)\nprint(evaluation_result)   # Output: False\n\n\nFor more examples, including assessing multiple jailbreak attempts or integrating with other tools, refer to the [`examples` folder](https://github.com/CryptoAILab/JailbreakEval/tree/main/examples).\n\n`JailbreakEval` also provides a powerful Command Line Interface (CLI) tool for evaluating collections of jailbreak attempts from CSV datasets. This allows for batch processing and detailed reporting of metrics like coverage, cost, and success rates. You can configure evaluators requiring external services, such as OpenAI or Hugging Face, via environment variables or a YAML configuration file.\n\n## Why Use JailbreakEval?\n\nJailbreakEval stands out as a crucial tool for anyone involved in LLM safety and security research. Its primary benefits include:\n\n*   **Unified Evaluation Framework**: It brings together diverse automated evaluators, simplifying the process of comparing and contrasting different assessment methods. This eliminates the need to manage multiple disparate tools.\n*   **Ease of Use**: With straightforward installation via pip and intuitive API/CLI interfaces, researchers can quickly integrate jailbreak evaluation into their workflows.\n*   **Comprehensive Evaluator Collection**: The toolkit includes a wide array of out-of-the-box evaluators, categorized into String Matching, Chat, Text Classification, and Voting evaluators, covering various paradigms for assessing jailbreak success.\n*   **Extensibility**: Developers can easily craft and integrate new evaluators by following the provided schema, contributing to a growing ecosystem of evaluation tools.\n*   **Award-Winning Recognition**: Recognized with the NDSS'25 Best Technical Poster award, JailbreakEval demonstrates its significance and technical excellence in the field of AI safety.\n\n## Links\n\n*   **GitHub Repository**: [https://github.com/CryptoAILab/JailbreakEval](https://github.com/CryptoAILab/JailbreakEval)\n*   **arXiv Paper**: [https://arxiv.org/abs/2406.09321](https://arxiv.org/abs/2406.09321)\n*   **NDSS'25 Best Technical Poster**: [https://www.ndss-symposium.org/wp-content/uploads/2025-poster-19.pdf](https://www.ndss-symposium.org/wp-content/uploads/2025-poster-19.pdf)","metrics":{"detailViews":4,"githubClicks":1},"dates":{"published":null,"modified":"2026-06-26T20:16:18.000Z"}}