Jsonformer: Bulletproof Structured JSON Generation from Language Models
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
Jsonformer is a powerful library designed to generate syntactically correct and schema-conforming JSON from language models. It addresses the common challenge of unreliable JSON output by focusing on generating only content tokens, making the process more efficient and robust. This approach ensures bulletproof structured data generation for various applications.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Generating structured JSON from language models is a significant challenge, often resulting in syntactically incorrect or schema-non-compliant outputs. Traditional methods relying on prompt engineering, fine-tuning, or post-processing are frequently brittle. Jsonformer offers an innovative solution by acting as a wrapper around Hugging Face models. It intelligently fills in fixed JSON tokens during generation, delegating only the content tokens to the language model. This method ensures efficiency and bulletproof reliability for structured data generation, supporting various JSON schema types like number, boolean, string, array, and object.
Installation
To get started with Jsonformer, install it using pip:
pip install jsonformer
Examples
Here's a basic example demonstrating how to use Jsonformer to generate structured data based on a defined schema:
from jsonformer import Jsonformer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b")
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")
json_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"is_student": {"type": "boolean"},
"courses": {
"type": "array",
"items": {"type": "string"}
}
}
}
prompt = "Generate a person's information based on the following schema:"
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()
print(generated_data)
Jsonformer also handles complex, nested schemas effectively, even with smaller models.
Why Use Jsonformer?
Jsonformer stands out for several key features:
- Bulletproof JSON Generation: It guarantees that the generated JSON is always syntactically correct and adheres strictly to the specified schema, eliminating common errors.
- Efficiency: By intelligently generating only the variable content tokens and filling in the fixed structural tokens, Jsonformer is significantly more efficient than traditional methods that generate and then parse full JSON strings.
- Flexible and Extendable: Built upon the Hugging Face transformers library, Jsonformer is compatible with any language model that supports the Hugging Face interface, offering broad applicability.
Links
- GitHub Repository: 1rgs/jsonformer
- Colab Example: Jsonformer_example.ipynb
Related repositories
Similar repositories that may be relevant next.
SQLModel: Simplifying SQL Databases in Python with Pydantic and SQLAlchemy
December 8, 2025
SQLModel is a Python library designed for intuitive, compatible, and robust interaction with SQL databases. Built on Pydantic and SQLAlchemy, it streamlines database operations, especially within FastAPI applications, by leveraging Python type annotations. It aims to minimize code duplication and enhance developer experience with excellent editor support.

Marker: High-Accuracy Document Conversion to Markdown and JSON
November 9, 2025
Marker is an open-source Python tool designed for high-accuracy conversion of documents like PDFs, images, and office files into Markdown, JSON, and HTML. It excels at preserving complex formatting, extracting images, and can leverage LLMs for even greater precision. This makes Marker a powerful solution for structured document intelligence.

Dasel: A Universal CLI Tool for Data Selection and Transformation
October 26, 2025
Dasel is a powerful command-line tool and Go library designed for querying, modifying, and transforming data across various formats like JSON, YAML, TOML, XML, and CSV. It provides a consistent syntax, making it an invaluable asset for developers, DevOps professionals, and anyone involved in data wrangling tasks. Its ability to convert between formats and integrate into scripts further enhances its utility.

JSON For You: The Ultimate JSON Visualization and Processing Tool
October 20, 2025
JSON For You is a powerful web-based tool designed for comprehensive JSON visualization and processing. It offers various view modes like Graph and Table, supports structured comparisons, and integrates `jq` for advanced querying. This open-source project provides an intuitive UI for developers working with JSON data.
Source repository
Open the original repository on GitHub.