{"name":"EasyJailbreak: A Python Framework for Adversarial LLM Jailbreak Prompts","description":"EasyJailbreak is an intuitive Python framework designed for generating adversarial jailbreak prompts for Large Language Models (LLMs). It provides a structured approach to decompose the jailbreaking process into iterative steps, offering components for mutation, attack, and evaluation. This tool is ideal for researchers and developers focused on LLM security and understanding model vulnerabilities.","github":"https://github.com/EasyJailbreak/EasyJailbreak","url":"https://osrepos.com/repo/easyjailbreak-easyjailbreak","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/easyjailbreak-easyjailbreak","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/easyjailbreak-easyjailbreak.md","json":"https://osrepos.com/repo/easyjailbreak-easyjailbreak.json","topics":["Python","Jailbreak","LLM Security","Adversarial AI","AI Safety","Large Language Model","Framework"],"keywords":["Python","Jailbreak","LLM Security","Adversarial AI","AI Safety","Large Language Model","Framework"],"stars":null,"summary":"EasyJailbreak is an intuitive Python framework designed for generating adversarial jailbreak prompts for Large Language Models (LLMs). It provides a structured approach to decompose the jailbreaking process into iterative steps, offering components for mutation, attack, and evaluation. This tool is ideal for researchers and developers focused on LLM security and understanding model vulnerabilities.","content":"## Introduction\n\nEasyJailbreak is an easy-to-use Python framework specifically designed for researchers and developers focusing on Large Language Model (LLM) security. It provides a robust platform for generating adversarial jailbreak prompts by assembling various methods. The framework decomposes the mainstream jailbreaking process into several iterable steps: initializing mutation seeds, selecting suitable seeds, adding constraints, mutating, attacking, and evaluating. This modular design creates a flexible playground for further research and experimentation in LLM safety and vulnerability.\n\nFor more in-depth information, you can refer to the [official paper](https://arxiv.org/pdf/2403.12171.pdf), explore different LLMs' jailbreak results on the [EasyJailbreak Website](http://easyjailbreak.org/), and consult the [detailed documentation](https://easyjailbreak.github.io/EasyJailbreakDoc.github.io) for API and parameter explanations.\n\n## Installation\n\nTo get started with EasyJailbreak, ensure you have `python>=3.9` installed. There are two primary methods for installation:\n\n1.  **For users who only require the collected approaches (recipes):**\n    bash\n    pip install easyjailbreak\n    \n\n2.  **For users interested in adding new components (e.g., new mutate or evaluate methods):**\n    bash\n    git clone https://github.com/EasyJailbreak/EasyJailbreak.git\n    cd EasyJailbreak\n    pip install -e .\n    \n\n## Examples\n\nEasyJailbreak provides a straightforward API to utilize its pre-implemented attack \"recipes\" on various models. Here's an example demonstrating how to use the `PAIR` recipe:\n\npython\nfrom easyjailbreak.attacker.PAIR_chao_2023 import PAIR\nfrom easyjailbreak.datasets import JailbreakDataset\nfrom easyjailbreak.models.huggingface_model import from_pretrained\nfrom easyjailbreak.models.openai_model import OpenaiModel\n\n# First, prepare models and datasets.\nattack_model = from_pretrained(model_name_or_path='lmsys/vicuna-13b-v1.5',\n                               model_name='vicuna_v1.1')\ntarget_model = OpenaiModel(model_name='gpt-4',\n                         api_keys='INPUT YOUR KEY HERE!!!')\neval_model = OpenaiModel(model_name='gpt-4',\n                         api_keys='INPUT YOUR KEY HERE!!!')\ndataset = JailbreakDataset('AdvBench')\n\n# Then instantiate the recipe.\nattacker = PAIR(attack_model=attack_model,\n                target_model=target_model,\n                eval_model=eval_model,\n                jailbreak_datasets=dataset)\n\n# Finally, start jailbreaking.\nattacker.attack(save_path='vicuna-13b-v1.5_gpt4_gpt4_AdvBench_result.jsonl')\n\n\nFor more advanced customization, such as loading models, datasets, initializing seeds, and instantiating individual components (Selectors, Mutators, Constraints, Evaluators), refer to the comprehensive [documentation](https://easyjailbreak.github.io/EasyJailbreakDoc.github.io/).\n\n## Why Use EasyJailbreak?\n\nEasyJailbreak stands out as a valuable tool for several reasons:\n\n*   **Ease of Use:** It offers an intuitive Python framework, simplifying the complex process of generating adversarial prompts.\n*   **Modular Design:** The framework's decomposition into distinct, iterable steps allows for flexible experimentation and the development of custom attack methods.\n*   **Comprehensive Recipes:** It collects and implements numerous attack recipes from relevant papers, providing a ready-to-use toolkit for evaluating LLM vulnerabilities.\n*   **LLM Security Focus:** Designed specifically for LLM security research, it helps identify and understand potential weaknesses in large language models.\n*   **Extensibility:** Researchers can easily integrate new components, such as novel mutation techniques or evaluation metrics, to push the boundaries of LLM safety research.\n\n## Links\n\n*   **GitHub Repository:** [https://github.com/EasyJailbreak/EasyJailbreak](https://github.com/EasyJailbreak/EasyJailbreak)\n*   **Official Website:** [http://easyjailbreak.org/](http://easyjailbreak.org/)\n*   **Documentation:** [https://easyjailbreak.github.io/EasyJailbreakDoc.github.io](https://easyjailbreak.github.io/EasyJailbreakDoc.github.io)\n*   **Research Paper:** [https://arxiv.org/pdf/2403.12171.pdf](https://arxiv.org/pdf/2403.12171.pdf)","metrics":{"detailViews":1,"githubClicks":0},"dates":{"published":null,"modified":"2026-06-26T15:37:34.000Z"}}