MarkLLM: An Open-Source Toolkit for LLM Watermarking

Introduction

MarkLLM is a comprehensive, open-source toolkit developed to advance the research and application of watermarking technologies for Large Language Models (LLMs). As the use of LLMs expands, ensuring the authenticity and origin of machine-generated text has become critically important. MarkLLM addresses this need by providing a unified and extensible platform that simplifies access, understanding, and assessment of various watermarking algorithms.

The toolkit supports a wide array of watermarking methods, offering a streamlined approach to integrate and expand these techniques. It also includes custom visualization tools to demystify how different algorithms operate and a robust evaluation module with 12 tools to assess detectability, robustness, and impact on text quality. MarkLLM was accepted as an EMNLP 2024 Demo, highlighting its significance in the field.

Installation

To get started with MarkLLM, follow these steps:

Python and PyTorch: Ensure you have Python 3.10 and PyTorch installed.
Install dependencies:

pip install -r requirements.txt

Cython files (for EXPEdit or ITSEdit): If you plan to use the EXPEdit or ITSEdit algorithms, you'll need to compile their Cython files:

python watermark/exp_edit/cython_files/setup.py build_ext --inplace

Then, move the generated .so file into watermark/exp_edit/cython_files/.

Hugging Face Models: Note that model weights for self-trained watermarking algorithms are stored on Hugging Face's Generative-Watermark-Toolkits. You should download the necessary models according to the config paths and save them to the model/ directory before running the code.

Examples

MarkLLM provides intuitive interfaces for invoking watermarking algorithms, visualizing their mechanisms, and applying evaluation pipelines.

Invoking Watermarking Algorithms

import torch
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Transformers config
transformers_config = TransformersConfig(model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device),
                                         tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
                                         vocab_size=50272,
                                         device=device,
                                         max_new_tokens=200,
                                         min_length=230,
                                         do_sample=True,
                                         no_repeat_ngram_size=4)
  
# Load watermark algorithm
myWatermark = AutoWatermark.load('KGW', 
                                 algorithm_config='config/KGW.json',
                                 transformers_config=transformers_config)

# Prompt
prompt = 'Good Morning.'

# Generate and detect
watermarked_text = myWatermark.generate_watermarked_text(prompt)
detect_result = myWatermark.detect_watermark(watermarked_text)
unwatermarked_text = myWatermark.generate_unwatermarked_text(prompt)
detect_result = myWatermark.detect_watermark(unwatermarked_text)

Visualizing Mechanisms

MarkLLM offers visualization tools to understand how watermarks are embedded. Here's an example for the KGW family:

import torch
from visualize.font_settings import FontSettings
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from visualize.visualizer import DiscreteVisualizer
from visualize.legend_settings import DiscreteLegendSettings
from visualize.page_layout_settings import PageLayoutSettings
from visualize.color_scheme import ColorSchemeForDiscreteVisualization

# Load watermark algorithm and get data
# ... (similar setup as above for myWatermark)
watermarked_data = myWatermark.get_data_for_visualization(watermarked_text)
unwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)

# Init visualizer
visualizer = DiscreteVisualizer(color_scheme=ColorSchemeForDiscreteVisualization(),
                                font_settings=FontSettings(), 
                                page_layout_settings=PageLayoutSettings(),
                                legend_settings=DiscreteLegendSettings())
# Visualize and Save
watermarked_img = visualizer.visualize(data=watermarked_data, 
                                       show_text=True, 
                                       visualize_weight=True, 
                                       display_legend=True)
watermarked_img.save("KGW_watermarked.png")

Applying Evaluation Pipelines

The toolkit includes pipelines for watermark detection and text quality analysis.

import torch
from evaluation.dataset import C4Dataset
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from evaluation.tools.text_editor import TruncatePromptTextEditor, WordDeletion
from evaluation.tools.success_rate_calculator import DynamicThresholdSuccessRateCalculator
from evaluation.pipelines.detection import WatermarkedTextDetectionPipeline, UnWatermarkedTextDetectionPipeline, DetectionPipelineReturnType

# Load dataset, device, and transformers config
# ... (similar setup as above)

# Load watermark algorithm
my_watermark = AutoWatermark.load('KGW', 
                                  algorithm_config='config/KGW.json',
                                  transformers_config=transformers_config)

# Init pipelines
pipeline1 = WatermarkedTextDetectionPipeline(
    dataset=my_dataset, 
    text_editor_list=[TruncatePromptTextEditor(), WordDeletion(ratio=0.3)],
    show_progress=True, 
    return_type=DetectionPipelineReturnType.SCORES) 

pipeline2 = UnWatermarkedTextDetectionPipeline(dataset=my_dataset, 
                                               text_editor_list=[],
                                               show_progress=True,
                                               return_type=DetectionPipelineReturnType.SCORES)

# Evaluate
calculator = DynamicThresholdSuccessRateCalculator(labels=['TPR', 'F1'], rule='best')
print(calculator.calculate(pipeline1.evaluate(my_watermark), pipeline2.evaluate(my_watermark)))

More examples and detailed usage instructions can be found in the test/ and evaluation/examples/ directories of the repository, as well as in the provided Jupyter notebooks.

Why Use MarkLLM?

MarkLLM offers several compelling reasons for researchers and developers working with LLM watermarking:

Unified Framework: It provides a consistent and extensible platform for implementing and comparing various watermarking algorithms, simplifying development and research.
Comprehensive Evaluation: With 12 evaluation tools covering detectability, robustness, and text quality, along with customizable automated pipelines, MarkLLM ensures thorough assessment of watermarking technologies.
Enhanced Understanding: Visualization tools offer clear insights into the operational mechanisms of different algorithms, making complex concepts more accessible.
Active Development: The toolkit is actively maintained and updated with new watermarking methods and features, reflecting the latest advancements in the field.
Community Contribution: It encourages community contributions, fostering a collaborative environment for making text watermarking more accessible.