Attachments: The Python Funnel for LLM Context and Multimodal Data

Summary
Attachments simplifies providing context to Large Language Models by transforming various file types into model-ready text and images. This Python library acts as a universal funnel, enabling developers to integrate diverse data sources like PDFs, images, web content, and even entire code repositories with just a few lines of code. It supports popular LLM APIs and frameworks, making multimodal AI development more accessible.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
attachments is a powerful Python library designed to simplify providing context to Large Language Models (LLMs). It acts as a universal funnel, transforming various file types, including PDFs, images, web pages, and even entire code repositories, into model-ready text and base64 encoded images. With attachments, developers can integrate diverse data sources into their LLM applications with just a few lines of code, streamlining multimodal AI development.
Installation
Getting started with attachments is straightforward. You can install it using pip:
pip install attachments
For advanced features like CSS selector highlighting or Microsoft Office support, specific extras can be installed:
# For CSS selector highlighting (requires Playwright)
pip install attachments[browser]
playwright install chromium
# For Microsoft Office format support
pip install attachments[office]
Examples
attachments offers a simple API for common use cases and a powerful Domain Specific Language (DSL) for advanced processing.
Quick Start
Transform any file into LLM-ready content:
from attachments import Attachments
# Process a local file or a URL
ctx = Attachments("path/to/file.pdf")
# or
ctx = Attachments("https://example.com/document.docx")
llm_ready_text = str(ctx) # All extracted text, "prompt-engineered"
llm_ready_images = ctx.images # list[str] – base64 PNGs
print(f"Extracted text length: {len(llm_ready_text)}")
print(f"Number of images: {len(llm_ready_images)}")
LLM Integration
attachments provides direct integration with popular LLM APIs like OpenAI and Anthropic, formatting the content appropriately:
from openai import OpenAI
from attachments import Attachments
# Process a PowerPoint presentation, selecting slides 3 to 5
pptx = Attachments("https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]")
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4.1-nano", # Use an appropriate vision-capable model
messages=pptx.openai_chat("Analyze the following document:")
)
print(resp.choices[0].message.content)
Advanced DSL Usage
The DSL allows for precise control over content extraction and transformation:
- Select pages or slides:
report.pdf[1,3-5,-1] - Web content selection:
url[select:title] - Image transformations:
photo.jpg[rotate:90][crop:100,100,400,300] - Repository analysis:
./my-project[ignore:standard][max_files:100]
Why Use Attachments?
attachments stands out for several key reasons:
- Comprehensive Multimodal Support: It handles a wide array of formats out of the box, including PDFs, PowerPoint, Word, Excel, CSV, TXT, Markdown, HTML, various image types, ZIP archives, and even Git repositories.
- Simplified LLM Context: It abstracts away the complexity of parsing and formatting diverse data for LLMs, providing a unified
textandimagesoutput. - Powerful DSL: The intuitive Domain Specific Language allows for granular control over content extraction, filtering, and transformation, enabling complex workflows with concise syntax.
- Extensibility: The modular pipeline architecture allows users to easily extend its capabilities by adding custom loaders, modifiers, presenters, refiners, and adapters.
- Direct LLM API Integration: It provides helper methods to format content directly for OpenAI, Anthropic, and DSPy, saving development time.
- Advanced Features: Capabilities like CSS selector highlighting for web scraping and dedicated Microsoft Office support enhance its utility for specialized tasks.
Links
Explore the attachments project further on GitHub:
- GitHub Repository: MaximeRivest/attachments