Instructor: Structured Outputs for LLMs with Pydantic and Python

Summary

Instructor is a powerful Python library that simplifies extracting structured data from Large Language Models (LLMs). It integrates Pydantic for robust validation, type safety, and IDE support, eliminating the need for manual JSON parsing, error handling, and retries. This tool provides a streamlined and reliable way to get structured outputs from any LLM.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Instructor is a powerful Python library designed to simplify the process of extracting structured data from Large Language Models (LLMs). It leverages Pydantic to provide robust validation, type safety, and excellent IDE support, making it easier to get reliable JSON outputs from any LLM. This tool eliminates the need for manual JSON parsing, error handling, and retries, streamlining LLM integration into your applications.

Installation

Getting started with Instructor is straightforward. You can install it using pip:

pip install instructor

For other package managers, you can use:

uv add instructor
poetry add instructor

Examples

Instructor allows you to define your desired output structure using Pydantic models and then extract that structure directly from natural language. Here's a basic example:

import instructor
from pydantic import BaseModel

# Define what you want
class User(BaseModel):
    name: str
    age: int

# Extract it from natural language
client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
    response_model=User,
    messages=[{"role": "user", "content": "John is 25 years old"}],
)

print(user) # User(name='John', age=25)

Instructor also supports advanced features like automatic retries for failed validations, streaming partial objects, and extracting complex nested data structures, making it suitable for production environments.

Why Use Instructor?

Instructor addresses common challenges in working with LLMs by offering several key advantages:

Simplified LLM Interactions: It abstracts away the complexity of writing intricate JSON schemas, handling validation errors, managing retries, and parsing unstructured responses.
Pydantic Integration: By building on Pydantic, Instructor provides out-of-the-box type safety, data validation, and enhanced developer experience with IDE support.
Provider Agnostic: Use the same simple API across various LLM providers, including OpenAI, Anthropic, Google, and local models like Ollama.
Production-Ready Features: Includes automatic retries with error feedback for validation failures and streaming support for partial object generation, ensuring robust applications.
Battle-Tested: Trusted by over 100,000 developers and companies, with millions of monthly downloads and thousands of GitHub stars, proving its reliability in real-world scenarios.

Compared to alternatives:

vs Raw JSON mode: Instructor offers automatic validation, retries, streaming, and nested object support without manual schema writing.
vs LangChain/LlamaIndex: Instructor is a lighter, faster, and more focused solution specifically for structured extraction.
vs Custom solutions: It's a battle-tested library that handles edge cases and provides a robust foundation for your AI applications.