MarkLLM: An Open-Source Toolkit for LLM Watermarking
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
MarkLLM is a comprehensive, open-source toolkit developed to advance the research and application of watermarking technologies for Large Language Models (LLMs). As the use of LLMs expands, ensuring the authenticity and origin of machine-generated text has become critically important. MarkLLM addresses this need by providing a unified and extensible platform that simplifies access, understanding, and assessment of various watermarking algorithms.
The toolkit supports a wide array of watermarking methods, offering a streamlined approach to integrate and expand these techniques. It also includes custom visualization tools to demystify how different algorithms operate and a robust evaluation module with 12 tools to assess detectability, robustness, and impact on text quality. MarkLLM was accepted as an EMNLP 2024 Demo, highlighting its significance in the field.
Installation
To get started with MarkLLM, follow these steps:
- Python and PyTorch: Ensure you have Python 3.10 and PyTorch installed.
- Install dependencies:
pip install -r requirements.txt
- Cython files (for EXPEdit or ITSEdit): If you plan to use the EXPEdit or ITSEdit algorithms, you'll need to compile their Cython files:
python watermark/exp_edit/cython_files/setup.py build_ext --inplace
Then, move the generated .so file into watermark/exp_edit/cython_files/.
- Hugging Face Models: Note that model weights for self-trained watermarking algorithms are stored on Hugging Face's Generative-Watermark-Toolkits. You should download the necessary models according to the config paths and save them to the
model/directory before running the code.
Examples
MarkLLM provides intuitive interfaces for invoking watermarking algorithms, visualizing their mechanisms, and applying evaluation pipelines.
Invoking Watermarking Algorithms
import torch
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# Device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Transformers config
transformers_config = TransformersConfig(model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device),
tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
vocab_size=50272,
device=device,
max_new_tokens=200,
min_length=230,
do_sample=True,
no_repeat_ngram_size=4)
# Load watermark algorithm
myWatermark = AutoWatermark.load('KGW',
algorithm_config='config/KGW.json',
transformers_config=transformers_config)
# Prompt
prompt = 'Good Morning.'
# Generate and detect
watermarked_text = myWatermark.generate_watermarked_text(prompt)
detect_result = myWatermark.detect_watermark(watermarked_text)
unwatermarked_text = myWatermark.generate_unwatermarked_text(prompt)
detect_result = myWatermark.detect_watermark(unwatermarked_text)
Visualizing Mechanisms
MarkLLM offers visualization tools to understand how watermarks are embedded. Here's an example for the KGW family:
import torch
from visualize.font_settings import FontSettings
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from visualize.visualizer import DiscreteVisualizer
from visualize.legend_settings import DiscreteLegendSettings
from visualize.page_layout_settings import PageLayoutSettings
from visualize.color_scheme import ColorSchemeForDiscreteVisualization
# Load watermark algorithm and get data
# ... (similar setup as above for myWatermark)
watermarked_data = myWatermark.get_data_for_visualization(watermarked_text)
unwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)
# Init visualizer
visualizer = DiscreteVisualizer(color_scheme=ColorSchemeForDiscreteVisualization(),
font_settings=FontSettings(),
page_layout_settings=PageLayoutSettings(),
legend_settings=DiscreteLegendSettings())
# Visualize and Save
watermarked_img = visualizer.visualize(data=watermarked_data,
show_text=True,
visualize_weight=True,
display_legend=True)
watermarked_img.save("KGW_watermarked.png")
Applying Evaluation Pipelines
The toolkit includes pipelines for watermark detection and text quality analysis.
import torch
from evaluation.dataset import C4Dataset
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from evaluation.tools.text_editor import TruncatePromptTextEditor, WordDeletion
from evaluation.tools.success_rate_calculator import DynamicThresholdSuccessRateCalculator
from evaluation.pipelines.detection import WatermarkedTextDetectionPipeline, UnWatermarkedTextDetectionPipeline, DetectionPipelineReturnType
# Load dataset, device, and transformers config
# ... (similar setup as above)
# Load watermark algorithm
my_watermark = AutoWatermark.load('KGW',
algorithm_config='config/KGW.json',
transformers_config=transformers_config)
# Init pipelines
pipeline1 = WatermarkedTextDetectionPipeline(
dataset=my_dataset,
text_editor_list=[TruncatePromptTextEditor(), WordDeletion(ratio=0.3)],
show_progress=True,
return_type=DetectionPipelineReturnType.SCORES)
pipeline2 = UnWatermarkedTextDetectionPipeline(dataset=my_dataset,
text_editor_list=[],
show_progress=True,
return_type=DetectionPipelineReturnType.SCORES)
# Evaluate
calculator = DynamicThresholdSuccessRateCalculator(labels=['TPR', 'F1'], rule='best')
print(calculator.calculate(pipeline1.evaluate(my_watermark), pipeline2.evaluate(my_watermark)))
More examples and detailed usage instructions can be found in the test/ and evaluation/examples/ directories of the repository, as well as in the provided Jupyter notebooks.
Why Use MarkLLM?
MarkLLM offers several compelling reasons for researchers and developers working with LLM watermarking:
- Unified Framework: It provides a consistent and extensible platform for implementing and comparing various watermarking algorithms, simplifying development and research.
- Comprehensive Evaluation: With 12 evaluation tools covering detectability, robustness, and text quality, along with customizable automated pipelines, MarkLLM ensures thorough assessment of watermarking technologies.
- Enhanced Understanding: Visualization tools offer clear insights into the operational mechanisms of different algorithms, making complex concepts more accessible.
- Active Development: The toolkit is actively maintained and updated with new watermarking methods and features, reflecting the latest advancements in the field.
- Community Contribution: It encourages community contributions, fostering a collaborative environment for making text watermarking more accessible.
Links
- GitHub Repository: https://github.com/THU-BPM/MarkLLM
- Homepage: https://generative-watermark.github.io/
- Paper (arXiv): https://arxiv.org/abs/2405.10051
- Hugging Face Models: https://huggingface.co/Generative-Watermark-Toolkits
- EMNLP 2024 Demo: https://aclanthology.org/2024.emnlp-demo.7/
- Google Colab: https://colab.research.google.com/drive/169MS4dY6fKNPZ7-92ETz1bAm_xyNAs0B?usp=sharing
- Video Description: https://www.youtube.com/watch?v=QN3BhNvw14E&
Related repositories
Similar repositories that may be relevant next.

langcorn: Serve LangChain LLM Apps and Agents with FastAPI
March 2, 2026
Langcorn is an innovative API server designed to effortlessly deploy LangChain models and pipelines. It leverages the high-performance FastAPI framework, offering a robust and scalable solution for serving large language model applications. With features like easy installation, built-in authentication, and support for custom API keys, Langcorn streamlines the process of bringing your LLM projects to production.
EasyEdit: An Easy-to-Use Knowledge Editing Framework for LLMs
January 26, 2026
EasyEdit is an open-source framework designed for efficient knowledge editing in Large Language Models (LLMs). It provides a unified, easy-to-use platform to modify, insert, or erase specific knowledge within LLMs without negatively impacting overall performance. This tool is crucial for aligning LLMs with evolving user needs and correcting factual inaccuracies.

LitGPT: High-Performance LLMs for Pretraining, Finetuning, and Deployment
January 4, 2026
LitGPT, by Lightning AI, is a comprehensive GitHub repository offering over 20 high-performance Large Language Models (LLMs). It provides robust recipes and tools to pretrain, finetune, and deploy these models at scale. Designed with minimal abstractions, LitGPT ensures blazing fast, minimal, and performant solutions for enterprise-grade AI development.

ggml: A Low-Level Tensor Library for Machine Learning
December 16, 2025
ggml is an innovative tensor library designed for machine learning, emphasizing low-level, cross-platform implementation. It offers features like integer quantization, automatic differentiation, and broad hardware support, all while maintaining zero third-party dependencies and efficient memory usage. This project is actively developed and forms the backbone for other popular projects like llama.cpp and whisper.cpp.
Source repository
Open the original repository on GitHub.