MarkLLM: An Open-Source Toolkit for LLM Watermarking

This repository profile is provided by osrepos.com, an open source repository discovery platform.

MarkLLM: An Open-Source Toolkit for LLM Watermarking

Summary

MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

Repository Information

Analyzed by OSRepos on June 23, 2026

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

MarkLLM is a comprehensive, open-source toolkit developed to advance the research and application of watermarking technologies for Large Language Models (LLMs). As the use of LLMs expands, ensuring the authenticity and origin of machine-generated text has become critically important. MarkLLM addresses this need by providing a unified and extensible platform that simplifies access, understanding, and assessment of various watermarking algorithms.

The toolkit supports a wide array of watermarking methods, offering a streamlined approach to integrate and expand these techniques. It also includes custom visualization tools to demystify how different algorithms operate and a robust evaluation module with 12 tools to assess detectability, robustness, and impact on text quality. MarkLLM was accepted as an EMNLP 2024 Demo, highlighting its significance in the field.

Installation

To get started with MarkLLM, follow these steps:

  1. Python and PyTorch: Ensure you have Python 3.10 and PyTorch installed.
  2. Install dependencies:
pip install -r requirements.txt
  1. Cython files (for EXPEdit or ITSEdit): If you plan to use the EXPEdit or ITSEdit algorithms, you'll need to compile their Cython files:
python watermark/exp_edit/cython_files/setup.py build_ext --inplace

Then, move the generated .so file into watermark/exp_edit/cython_files/.

  1. Hugging Face Models: Note that model weights for self-trained watermarking algorithms are stored on Hugging Face's Generative-Watermark-Toolkits. You should download the necessary models according to the config paths and save them to the model/ directory before running the code.

Examples

MarkLLM provides intuitive interfaces for invoking watermarking algorithms, visualizing their mechanisms, and applying evaluation pipelines.

Invoking Watermarking Algorithms

import torch
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Transformers config
transformers_config = TransformersConfig(model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device),
                                         tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
                                         vocab_size=50272,
                                         device=device,
                                         max_new_tokens=200,
                                         min_length=230,
                                         do_sample=True,
                                         no_repeat_ngram_size=4)
  
# Load watermark algorithm
myWatermark = AutoWatermark.load('KGW', 
                                 algorithm_config='config/KGW.json',
                                 transformers_config=transformers_config)

# Prompt
prompt = 'Good Morning.'

# Generate and detect
watermarked_text = myWatermark.generate_watermarked_text(prompt)
detect_result = myWatermark.detect_watermark(watermarked_text)
unwatermarked_text = myWatermark.generate_unwatermarked_text(prompt)
detect_result = myWatermark.detect_watermark(unwatermarked_text)

Visualizing Mechanisms

MarkLLM offers visualization tools to understand how watermarks are embedded. Here's an example for the KGW family:

import torch
from visualize.font_settings import FontSettings
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from visualize.visualizer import DiscreteVisualizer
from visualize.legend_settings import DiscreteLegendSettings
from visualize.page_layout_settings import PageLayoutSettings
from visualize.color_scheme import ColorSchemeForDiscreteVisualization

# Load watermark algorithm and get data
# ... (similar setup as above for myWatermark)
watermarked_data = myWatermark.get_data_for_visualization(watermarked_text)
unwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)

# Init visualizer
visualizer = DiscreteVisualizer(color_scheme=ColorSchemeForDiscreteVisualization(),
                                font_settings=FontSettings(), 
                                page_layout_settings=PageLayoutSettings(),
                                legend_settings=DiscreteLegendSettings())
# Visualize and Save
watermarked_img = visualizer.visualize(data=watermarked_data, 
                                       show_text=True, 
                                       visualize_weight=True, 
                                       display_legend=True)
watermarked_img.save("KGW_watermarked.png")

Applying Evaluation Pipelines

The toolkit includes pipelines for watermark detection and text quality analysis.

import torch
from evaluation.dataset import C4Dataset
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from evaluation.tools.text_editor import TruncatePromptTextEditor, WordDeletion
from evaluation.tools.success_rate_calculator import DynamicThresholdSuccessRateCalculator
from evaluation.pipelines.detection import WatermarkedTextDetectionPipeline, UnWatermarkedTextDetectionPipeline, DetectionPipelineReturnType

# Load dataset, device, and transformers config
# ... (similar setup as above)

# Load watermark algorithm
my_watermark = AutoWatermark.load('KGW', 
                                  algorithm_config='config/KGW.json',
                                  transformers_config=transformers_config)

# Init pipelines
pipeline1 = WatermarkedTextDetectionPipeline(
    dataset=my_dataset, 
    text_editor_list=[TruncatePromptTextEditor(), WordDeletion(ratio=0.3)],
    show_progress=True, 
    return_type=DetectionPipelineReturnType.SCORES) 

pipeline2 = UnWatermarkedTextDetectionPipeline(dataset=my_dataset, 
                                               text_editor_list=[],
                                               show_progress=True,
                                               return_type=DetectionPipelineReturnType.SCORES)

# Evaluate
calculator = DynamicThresholdSuccessRateCalculator(labels=['TPR', 'F1'], rule='best')
print(calculator.calculate(pipeline1.evaluate(my_watermark), pipeline2.evaluate(my_watermark)))

More examples and detailed usage instructions can be found in the test/ and evaluation/examples/ directories of the repository, as well as in the provided Jupyter notebooks.

Why Use MarkLLM?

MarkLLM offers several compelling reasons for researchers and developers working with LLM watermarking:

  • Unified Framework: It provides a consistent and extensible platform for implementing and comparing various watermarking algorithms, simplifying development and research.
  • Comprehensive Evaluation: With 12 evaluation tools covering detectability, robustness, and text quality, along with customizable automated pipelines, MarkLLM ensures thorough assessment of watermarking technologies.
  • Enhanced Understanding: Visualization tools offer clear insights into the operational mechanisms of different algorithms, making complex concepts more accessible.
  • Active Development: The toolkit is actively maintained and updated with new watermarking methods and features, reflecting the latest advancements in the field.
  • Community Contribution: It encourages community contributions, fostering a collaborative environment for making text watermarking more accessible.

Links

Related repositories

Similar repositories that may be relevant next.

langcorn: Serve LangChain LLM Apps and Agents with FastAPI

langcorn: Serve LangChain LLM Apps and Agents with FastAPI

March 2, 2026

Langcorn is an innovative API server designed to effortlessly deploy LangChain models and pipelines. It leverages the high-performance FastAPI framework, offering a robust and scalable solution for serving large language model applications. With features like easy installation, built-in authentication, and support for custom API keys, Langcorn streamlines the process of bringing your LLM projects to production.

apifastapilangchain
EasyEdit: An Easy-to-Use Knowledge Editing Framework for LLMs

EasyEdit: An Easy-to-Use Knowledge Editing Framework for LLMs

January 26, 2026

EasyEdit is an open-source framework designed for efficient knowledge editing in Large Language Models (LLMs). It provides a unified, easy-to-use platform to modify, insert, or erase specific knowledge within LLMs without negatively impacting overall performance. This tool is crucial for aligning LLMs with evolving user needs and correcting factual inaccuracies.

artificial-intelligenceknowledge-editinglarge-language-models
LitGPT: High-Performance LLMs for Pretraining, Finetuning, and Deployment

LitGPT: High-Performance LLMs for Pretraining, Finetuning, and Deployment

January 4, 2026

LitGPT, by Lightning AI, is a comprehensive GitHub repository offering over 20 high-performance Large Language Models (LLMs). It provides robust recipes and tools to pretrain, finetune, and deploy these models at scale. Designed with minimal abstractions, LitGPT ensures blazing fast, minimal, and performant solutions for enterprise-grade AI development.

llmlarge-language-modelsdeep-learning
ggml: A Low-Level Tensor Library for Machine Learning

ggml: A Low-Level Tensor Library for Machine Learning

December 16, 2025

ggml is an innovative tensor library designed for machine learning, emphasizing low-level, cross-platform implementation. It offers features like integer quantization, automatic differentiation, and broad hardware support, all while maintaining zero third-party dependencies and efficient memory usage. This project is actively developed and forms the backbone for other popular projects like llama.cpp and whisper.cpp.

machine-learningtensor-librarylarge-language-models

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️