PyPriompt: Python Library for Priority-Based Prompt Design

Introduction

PyPriompt (Python + priority + prompt) is a Python port of the Priompt library, remixed with FastHTML. It introduces a structured approach to prompt design, allowing developers to manage the context window of large language models (LLMs) by assigning priorities to different parts of the prompt. This ensures that the most critical information is always included, even when facing token limits. The library draws inspiration from web design frameworks, aiming to bring similar component-based development principles to prompt engineering. You can read more about the motivation behind this approach in the Prompt Design article.

Installation

PyPriompt can be easily installed using various Python package managers:

uv add priompt
rye add priompt
poetry add priompt
pip install priompt

Examples

Prompts in PyPriompt are rendered from Python components, allowing for dynamic and flexible prompt construction. Here's an example demonstrating a simple chat component that manages message history with priorities:

from priompt import (
  component, # Decorator for custom components
  # Standard components
  SystemMessage, # For system messages
  AssistantMessage, # For assistant messages
  UserMessage, # For user messages
  # Priompt components
  Empty, # For reserving empty space
  Scope, # For setting priorities
  # For rendering prompts to OpenAI chat messages or a string
  render,
  O200KTokenizer, # For GPT-4o, GPT-4o-mini
)

@component
def chat(name: str, message: str, history: list[dict[str, str]]):
    capitalized_name = name[0].upper() + name[1:]

    return [
        SystemMessage(f"The user's name is {capitalized_name}. Please respond to them kindly."),
        *[
            Scope(
                UserMessage(m["message"]) if m["role"] == "user" else AssistantMessage(m["message"]),
                prel=i,
            )
            for i, m in enumerate(history)
        ],
        UserMessage(message),
        Empty(1000),
    ]

messages = render(
    chat(
        name="cyrus",
        message="What is the answer to life, the universe, and everything?",
        history=[
            {"role": "user", "message": "Hello!"},
            {"role": "assistant", "message": "Hello! How can I help you today?"},
        ],
    ),
    {
        "token_limit": 8192, # However many tokens you want to limit to
        "tokenizer": O200KTokenizer,
    }
)

openai.chat.completions.create(
    ...
    messages=messages,
)

In this example, the chat component defines a prompt structure. The SystemMessage and the latest UserMessage are always included. Historical messages are wrapped in Scope with prel=i, meaning later messages (higher i) are prioritized over earlier ones. The render function then processes this component, respecting the token_limit and including as many high-priority items as possible.

Why Use PyPriompt

PyPriompt offers a powerful way to manage LLM prompts, especially when dealing with dynamic content and strict token limits. Its core principles revolve around component-based design and priority-driven content inclusion.

Key Principles:

Component-Based Prompt Design: Just like in web frameworks, you can create reusable Python components to build complex prompts, making them modular and maintainable.
Priority System: Each child within a component can be assigned an absolute (p) or relative (prel) priority. Higher priority items are favored for inclusion when token limits are approached.
Token Limit Optimization: The renderer guarantees that it will find the minimum priority cutoff such that the prompt's total token count stays within the specified limit, maximizing the included relevant information.

Building Blocks:

PyPriompt provides several specialized components to control prompt structure and content:

Scope: Sets priorities for its children.
First: Includes the first child with sufficiently high priority, useful for fallbacks.
Empty: Reserves a specified number of tokens, useful for ensuring space for model generation.
Capture: Captures and parses output directly within the prompt.
Isolate: Isolates a section with its own token limit, useful for caching or guaranteeing specific content.
Br: Forces a token break, useful for precise tokenization control.
Config: Specifies common configuration properties like stop tokens or maxResponseTokens.

Built-in Components for LLMs:

UserMessage, AssistantMessage, SystemMessage: For constructing standard chat-based prompts.
Image: For incorporating images into multimodal prompts.
Tools: For defining tools that the AI can call using a JSON schema.

Advanced Features:

Callbacks (on_eject, on_include): Execute custom logic when a scope is either excluded or included in the final prompt, allowing for adaptive prompt behavior.
Sourcemaps: When enabled, the renderer can compute a map between prompt characters and their originating JSX tree parts, aiding in debugging and caching strategies.

Important Considerations (Caveats):

While powerful, it's important to be aware of certain aspects:

Overusing priorities can sometimes be an anti-pattern, potentially making prompts harder to cache.
The current version supports around 10K scopes reasonably fast, which is sufficient for most use cases.
For latency-critical prompts, monitoring performance in a preview dashboard is recommended.
The renderer's token limit optimization is generally accurate but may have slight inaccuracies in specific edge cases involving First components.