Repository History

3 repositories tagged with AI Safety

Topic: AI Safety

EasyJailbreak: A Python Framework for Adversarial LLM Jailbreak Prompts

EasyJailbreak is an intuitive Python framework designed for generating adversarial jailbreak prompts for Large Language Models (LLMs). It provides a structured approach to decompose the jailbreaking process into iterative steps, offering components for mutation, attack, and evaluation. This tool is ideal for researchers and developers focused on LLM security and understanding model vulnerabilities.

Analyzed Jun 26, 2026

View Details

Guardrails: Enhancing LLM Reliability and Structured Data Generation

Guardrails is a Python framework designed to build reliable AI applications by adding guardrails to large language models. It helps detect, quantify, and mitigate risks in LLM inputs/outputs, and facilitates the generation of structured data. This framework ensures more predictable and safer interactions with AI models.

Analyzed Jun 26, 2026

View Details

AuditNLG: Auditing Generative AI for Trustworthiness

AuditNLG is an open-source library from Salesforce designed to enhance the trustworthiness of generative AI language models. It provides state-of-the-art techniques to detect and improve factualness, safety, and constraint adherence in AI-generated text. This library simplifies the process of auditing AI outputs, offering explanations and alternative suggestions for problematic content.

Analyzed Jun 25, 2026

View Details

Previous Page 1 Next