EasyJailbreak: A Python Framework for Adversarial LLM Jailbreak Prompts

This repository profile is provided by osrepos.com, an open source repository discovery platform.

EasyJailbreak: A Python Framework for Adversarial LLM Jailbreak Prompts

Summary

EasyJailbreak is an intuitive Python framework designed for generating adversarial jailbreak prompts for Large Language Models (LLMs). It provides a structured approach to decompose the jailbreaking process into iterative steps, offering components for mutation, attack, and evaluation. This tool is ideal for researchers and developers focused on LLM security and understanding model vulnerabilities.

Repository Information

Analyzed by OSRepos on June 26, 2026

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

EasyJailbreak is an easy-to-use Python framework specifically designed for researchers and developers focusing on Large Language Model (LLM) security. It provides a robust platform for generating adversarial jailbreak prompts by assembling various methods. The framework decomposes the mainstream jailbreaking process into several iterable steps: initializing mutation seeds, selecting suitable seeds, adding constraints, mutating, attacking, and evaluating. This modular design creates a flexible playground for further research and experimentation in LLM safety and vulnerability.

For more in-depth information, you can refer to the official paper, explore different LLMs' jailbreak results on the EasyJailbreak Website, and consult the detailed documentation for API and parameter explanations.

Installation

To get started with EasyJailbreak, ensure you have python>=3.9 installed. There are two primary methods for installation:

  1. For users who only require the collected approaches (recipes):
    pip install easyjailbreak
    
  2. For users interested in adding new components (e.g., new mutate or evaluate methods):
    git clone https://github.com/EasyJailbreak/EasyJailbreak.git
    cd EasyJailbreak
    pip install -e .
    

Examples

EasyJailbreak provides a straightforward API to utilize its pre-implemented attack "recipes" on various models. Here's an example demonstrating how to use the PAIR recipe:

from easyjailbreak.attacker.PAIR_chao_2023 import PAIR
from easyjailbreak.datasets import JailbreakDataset
from easyjailbreak.models.huggingface_model import from_pretrained
from easyjailbreak.models.openai_model import OpenaiModel

# First, prepare models and datasets.
attack_model = from_pretrained(model_name_or_path='lmsys/vicuna-13b-v1.5',
                               model_name='vicuna_v1.1')
target_model = OpenaiModel(model_name='gpt-4',
                         api_keys='INPUT YOUR KEY HERE!!!')
eval_model = OpenaiModel(model_name='gpt-4',
                         api_keys='INPUT YOUR KEY HERE!!!')
dataset = JailbreakDataset('AdvBench')

# Then instantiate the recipe.
attacker = PAIR(attack_model=attack_model,
                target_model=target_model,
                eval_model=eval_model,
                jailbreak_datasets=dataset)

# Finally, start jailbreaking.
attacker.attack(save_path='vicuna-13b-v1.5_gpt4_gpt4_AdvBench_result.jsonl')

For more advanced customization, such as loading models, datasets, initializing seeds, and instantiating individual components (Selectors, Mutators, Constraints, Evaluators), refer to the comprehensive documentation.

Why Use EasyJailbreak?

EasyJailbreak stands out as a valuable tool for several reasons:

  • Ease of Use: It offers an intuitive Python framework, simplifying the complex process of generating adversarial prompts.
  • Modular Design: The framework's decomposition into distinct, iterable steps allows for flexible experimentation and the development of custom attack methods.
  • Comprehensive Recipes: It collects and implements numerous attack recipes from relevant papers, providing a ready-to-use toolkit for evaluating LLM vulnerabilities.
  • LLM Security Focus: Designed specifically for LLM security research, it helps identify and understand potential weaknesses in large language models.
  • Extensibility: Researchers can easily integrate new components, such as novel mutation techniques or evaluation metrics, to push the boundaries of LLM safety research.

Links

Related repositories

Similar repositories that may be relevant next.

Guardrails: Enhancing LLM Reliability and Structured Data Generation

Guardrails: Enhancing LLM Reliability and Structured Data Generation

June 26, 2026

Guardrails is a Python framework designed to build reliable AI applications by adding guardrails to large language models. It helps detect, quantify, and mitigate risks in LLM inputs/outputs, and facilitates the generation of structured data. This framework ensures more predictable and safer interactions with AI models.

aifoundation-modelllm
Hiring Agent: An AI Agent for Resume Evaluation and Scoring

Hiring Agent: An AI Agent for Resume Evaluation and Scoring

June 26, 2026

Hiring Agent is an open-source AI agent designed to evaluate and score resumes objectively. It extracts structured data from PDF resumes, enriches it with GitHub profile signals, and provides a fair, explainable evaluation with detailed scores and evidence. This tool supports both local LLMs via Ollama and cloud-based options like Google Gemini.

PythonAIMachine Learning
LLM Guard: The Security Toolkit for LLM Interactions

LLM Guard: The Security Toolkit for LLM Interactions

June 26, 2026

LLM Guard is an open-source security toolkit developed by Protect AI, designed to fortify the safety of Large Language Models. It offers comprehensive protection against various threats, including prompt injection, data leakage, and harmful language, ensuring secure and reliable LLM interactions.

llm-securityprompt-injectionlarge-language-models
AuditNLG: Auditing Generative AI for Trustworthiness

AuditNLG: Auditing Generative AI for Trustworthiness

June 25, 2026

AuditNLG is an open-source library from Salesforce designed to enhance the trustworthiness of generative AI language models. It provides state-of-the-art techniques to detect and improve factualness, safety, and constraint adherence in AI-generated text. This library simplifies the process of auditing AI outputs, offering explanations and alternative suggestions for problematic content.

PythonGenerative AIAI Safety

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️