Instructor: Structured Outputs for LLMs with Pydantic and Python
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
Instructor is a powerful Python library that simplifies extracting structured data from Large Language Models (LLMs). It integrates Pydantic for robust validation, type safety, and IDE support, eliminating the need for manual JSON parsing, error handling, and retries. This tool provides a streamlined and reliable way to get structured outputs from any LLM.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Instructor is a powerful Python library designed to simplify the process of extracting structured data from Large Language Models (LLMs). It leverages Pydantic to provide robust validation, type safety, and excellent IDE support, making it easier to get reliable JSON outputs from any LLM. This tool eliminates the need for manual JSON parsing, error handling, and retries, streamlining LLM integration into your applications.
Installation
Getting started with Instructor is straightforward. You can install it using pip:
pip install instructor
For other package managers, you can use:
uv add instructor
poetry add instructor
Examples
Instructor allows you to define your desired output structure using Pydantic models and then extract that structure directly from natural language. Here's a basic example:
import instructor
from pydantic import BaseModel
# Define what you want
class User(BaseModel):
name: str
age: int
# Extract it from natural language
client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "John is 25 years old"}],
)
print(user) # User(name='John', age=25)
Instructor also supports advanced features like automatic retries for failed validations, streaming partial objects, and extracting complex nested data structures, making it suitable for production environments.
Why Use Instructor?
Instructor addresses common challenges in working with LLMs by offering several key advantages:
- Simplified LLM Interactions: It abstracts away the complexity of writing intricate JSON schemas, handling validation errors, managing retries, and parsing unstructured responses.
- Pydantic Integration: By building on Pydantic, Instructor provides out-of-the-box type safety, data validation, and enhanced developer experience with IDE support.
- Provider Agnostic: Use the same simple API across various LLM providers, including OpenAI, Anthropic, Google, and local models like Ollama.
- Production-Ready Features: Includes automatic retries with error feedback for validation failures and streaming support for partial object generation, ensuring robust applications.
- Battle-Tested: Trusted by over 100,000 developers and companies, with millions of monthly downloads and thousands of GitHub stars, proving its reliability in real-world scenarios.
Compared to alternatives:
- vs Raw JSON mode: Instructor offers automatic validation, retries, streaming, and nested object support without manual schema writing.
- vs LangChain/LlamaIndex: Instructor is a lighter, faster, and more focused solution specifically for structured extraction.
- vs Custom solutions: It's a battle-tested library that handles edge cases and provides a robust foundation for your AI applications.
Links
Explore Instructor further with these official resources:
Related repositories
Similar repositories that may be relevant next.

TensorRT-LLM: Optimizing Large Language Model Inference on NVIDIA GPUs
July 3, 2026
TensorRT-LLM is an open-source library by NVIDIA designed to optimize inference for Large Language Models (LLMs) and Visual Generation models. It offers a user-friendly Python API, state-of-the-art optimizations, and specialized kernels to ensure efficient performance on NVIDIA GPUs. This powerful tool enables developers to deploy LLMs with high throughput and low latency, from single-GPU setups to multi-node deployments.

DataDreamer: Streamlining Synthetic Data Generation and LLM Workflows
July 3, 2026
DataDreamer is an open-source Python library designed for efficient prompting, synthetic data generation, and model training workflows. It simplifies the process of creating complex LLM workflows, generating high-quality synthetic datasets, and aligning or fine-tuning models. Built to be simple, efficient, and research-grade, DataDreamer empowers users to build reproducible and shareable AI solutions.
EasyInstruct: An Easy-to-Use Instruction Processing Framework for LLMs
July 2, 2026
EasyInstruct is an open-source Python framework designed to simplify instruction processing for Large Language Models (LLMs). Accepted at ACL 2024, it offers modularized components for instruction generation, selection, and prompting, supporting various LLMs like GPT-4 and LLaMA. This framework is ideal for researchers and developers working on LLM-based experiments and applications.

LazyLLM: Low-Code Development for Multi-Agent LLM Applications
July 2, 2026
LazyLLM offers a low-code development tool designed for building multi-agent LLM applications with ease. It simplifies the creation of complex AI applications, providing a streamlined workflow for rapid prototyping, data feedback, and iterative optimization. Developers can leverage its extensive features for deployment, cross-platform compatibility, and efficient model fine-tuning.
Source repository
Open the original repository on GitHub.