XGrammar: Fast, Flexible, and Portable Structured Generation for LLMs

Summary

XGrammar is an open-source library for efficient, flexible, and portable structured generation, developed by mlc-ai. It leverages constrained decoding to guarantee 100% structural correctness for outputs like JSON and regex. Optimized for near-zero overhead, XGrammar offers universal deployment across various platforms, hardware, and programming languages, making it a leading solution for structured output from large language models.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

XGrammar is an open-source library developed by mlc-ai, designed for efficient, flexible, and portable structured generation. It leverages constrained decoding to ensure 100% structural correctness of the output, supporting general context-free grammars for various structures including JSON, regex, and custom grammars.

Known for its extremely low overhead, XGrammar achieves near-zero overhead in JSON generation, making it one of the fastest structured generation engines available. It boasts universal deployment across Linux, macOS, Windows, various hardware (CPU, NVIDIA GPU, AMD GPU, Apple Silicon, TPU), and multiple languages (Python, C++, JavaScript, Swift). XGrammar is widely integrated into popular LLM inference engines like vLLM, SGLang, TensorRT-LLM, and MLC-LLM.

Installation

To get started with XGrammar, you can install it via pip:

pip install xgrammar

For use with MPS on Apple Silicon, install with:

pip install "xgrammar[metal]"

Then, you can import it into your Python project:

import xgrammar as xgr

Examples

After installation, you can import XGrammar and begin integrating it into your LLM inference workflows. XGrammar provides APIs for defining grammars and applying them during the generation process to ensure outputs adhere to specified structures.

For comprehensive examples, detailed usage guides, and advanced configurations, please refer to the official XGrammar documentation.

Why Use XGrammar?

Guaranteed Structural Correctness: Ensures 100% valid outputs for JSON, regex, and custom grammars through its efficient constrained decoding mechanism.
Exceptional Performance: Achieves near-zero overhead in structured generation, making it one of the fastest engines available for this task.
Universal Compatibility: Supports a wide range of platforms (Linux, macOS, Windows), hardware (CPU, NVIDIA GPU, AMD GPU, Apple Silicon, TPU), and programming languages (Python, C++, JavaScript, Swift), offering unparalleled deployment flexibility.
Easy Integration: Seamlessly integrates with leading LLM inference engines, often serving as their default structured generation backend, simplifying development.
Active Development & Community: Backed by mlc-ai and widely adopted by numerous industry and open-source projects, ensuring continuous improvement and support.