EasyJailbreak: A Python Framework for Adversarial LLM Jailbreak Prompts
This repository profile is provided by osrepos.com, an open source repository discovery platform.
Summary
EasyJailbreak is an intuitive Python framework designed for generating adversarial jailbreak prompts for Large Language Models (LLMs). It provides a structured approach to decompose the jailbreaking process into iterative steps, offering components for mutation, attack, and evaluation. This tool is ideal for researchers and developers focused on LLM security and understanding model vulnerabilities.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
EasyJailbreak is an easy-to-use Python framework specifically designed for researchers and developers focusing on Large Language Model (LLM) security. It provides a robust platform for generating adversarial jailbreak prompts by assembling various methods. The framework decomposes the mainstream jailbreaking process into several iterable steps: initializing mutation seeds, selecting suitable seeds, adding constraints, mutating, attacking, and evaluating. This modular design creates a flexible playground for further research and experimentation in LLM safety and vulnerability.
For more in-depth information, you can refer to the official paper, explore different LLMs' jailbreak results on the EasyJailbreak Website, and consult the detailed documentation for API and parameter explanations.
Installation
To get started with EasyJailbreak, ensure you have python>=3.9 installed. There are two primary methods for installation:
- For users who only require the collected approaches (recipes):
pip install easyjailbreak - For users interested in adding new components (e.g., new mutate or evaluate methods):
git clone https://github.com/EasyJailbreak/EasyJailbreak.git cd EasyJailbreak pip install -e .
Examples
EasyJailbreak provides a straightforward API to utilize its pre-implemented attack "recipes" on various models. Here's an example demonstrating how to use the PAIR recipe:
from easyjailbreak.attacker.PAIR_chao_2023 import PAIR
from easyjailbreak.datasets import JailbreakDataset
from easyjailbreak.models.huggingface_model import from_pretrained
from easyjailbreak.models.openai_model import OpenaiModel
# First, prepare models and datasets.
attack_model = from_pretrained(model_name_or_path='lmsys/vicuna-13b-v1.5',
model_name='vicuna_v1.1')
target_model = OpenaiModel(model_name='gpt-4',
api_keys='INPUT YOUR KEY HERE!!!')
eval_model = OpenaiModel(model_name='gpt-4',
api_keys='INPUT YOUR KEY HERE!!!')
dataset = JailbreakDataset('AdvBench')
# Then instantiate the recipe.
attacker = PAIR(attack_model=attack_model,
target_model=target_model,
eval_model=eval_model,
jailbreak_datasets=dataset)
# Finally, start jailbreaking.
attacker.attack(save_path='vicuna-13b-v1.5_gpt4_gpt4_AdvBench_result.jsonl')
For more advanced customization, such as loading models, datasets, initializing seeds, and instantiating individual components (Selectors, Mutators, Constraints, Evaluators), refer to the comprehensive documentation.
Why Use EasyJailbreak?
EasyJailbreak stands out as a valuable tool for several reasons:
- Ease of Use: It offers an intuitive Python framework, simplifying the complex process of generating adversarial prompts.
- Modular Design: The framework's decomposition into distinct, iterable steps allows for flexible experimentation and the development of custom attack methods.
- Comprehensive Recipes: It collects and implements numerous attack recipes from relevant papers, providing a ready-to-use toolkit for evaluating LLM vulnerabilities.
- LLM Security Focus: Designed specifically for LLM security research, it helps identify and understand potential weaknesses in large language models.
- Extensibility: Researchers can easily integrate new components, such as novel mutation techniques or evaluation metrics, to push the boundaries of LLM safety research.
Links
- GitHub Repository: https://github.com/EasyJailbreak/EasyJailbreak
- Official Website: http://easyjailbreak.org/
- Documentation: https://easyjailbreak.github.io/EasyJailbreakDoc.github.io
- Research Paper: https://arxiv.org/pdf/2403.12171.pdf
Related repositories
Similar repositories that may be relevant next.

Guardrails: Enhancing LLM Reliability and Structured Data Generation
June 26, 2026
Guardrails is a Python framework designed to build reliable AI applications by adding guardrails to large language models. It helps detect, quantify, and mitigate risks in LLM inputs/outputs, and facilitates the generation of structured data. This framework ensures more predictable and safer interactions with AI models.

Hiring Agent: An AI Agent for Resume Evaluation and Scoring
June 26, 2026
Hiring Agent is an open-source AI agent designed to evaluate and score resumes objectively. It extracts structured data from PDF resumes, enriches it with GitHub profile signals, and provides a fair, explainable evaluation with detailed scores and evidence. This tool supports both local LLMs via Ollama and cloud-based options like Google Gemini.

LLM Guard: The Security Toolkit for LLM Interactions
June 26, 2026
LLM Guard is an open-source security toolkit developed by Protect AI, designed to fortify the safety of Large Language Models. It offers comprehensive protection against various threats, including prompt injection, data leakage, and harmful language, ensuring secure and reliable LLM interactions.

AuditNLG: Auditing Generative AI for Trustworthiness
June 25, 2026
AuditNLG is an open-source library from Salesforce designed to enhance the trustworthiness of generative AI language models. It provides state-of-the-art techniques to detect and improve factualness, safety, and constraint adherence in AI-generated text. This library simplifies the process of auditing AI outputs, offering explanations and alternative suggestions for problematic content.
Source repository
Open the original repository on GitHub.