# Attachments: The Python Funnel for LLM Context and Multimodal Data

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/maximerivest-attachments
Generated for open source discovery and AI-assisted research.

Attachments simplifies providing context to Large Language Models by transforming various file types into model-ready text and images. This Python library acts as a universal funnel, enabling developers to integrate diverse data sources like PDFs, images, web content, and even entire code repositories with just a few lines of code. It supports popular LLM APIs and frameworks, making multimodal AI development more accessible.

GitHub: https://github.com/MaximeRivest/attachments
OSRepos URL: https://osrepos.com/repo/maximerivest-attachments

## Summary

Attachments simplifies providing context to Large Language Models by transforming various file types into model-ready text and images. This Python library acts as a universal funnel, enabling developers to integrate diverse data sources like PDFs, images, web content, and even entire code repositories with just a few lines of code. It supports popular LLM APIs and frameworks, making multimodal AI development more accessible.

## Topics

- Python
- LLM
- Multimodal AI
- Data Processing
- File Conversion
- Web Scraping
- AI Tools
- Developer Tools

## Repository Information

Last analyzed by OSRepos: Mon Nov 24 2025 08:01:14 GMT+0000 (Western European Standard Time)
Detail views: 5
GitHub clicks: 4

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

`attachments` is a powerful Python library designed to simplify providing context to Large Language Models (LLMs). It acts as a universal funnel, transforming various file types, including PDFs, images, web pages, and even entire code repositories, into model-ready text and base64 encoded images. With `attachments`, developers can integrate diverse data sources into their LLM applications with just a few lines of code, streamlining multimodal AI development.

## Installation

Getting started with `attachments` is straightforward. You can install it using pip:

bash
pip install attachments


For advanced features like CSS selector highlighting or Microsoft Office support, specific extras can be installed:

bash
# For CSS selector highlighting (requires Playwright)
pip install attachments[browser]
playwright install chromium

# For Microsoft Office format support
pip install attachments[office]


## Examples

`attachments` offers a simple API for common use cases and a powerful Domain Specific Language (DSL) for advanced processing.

**Quick Start**

Transform any file into LLM-ready content:

python
from attachments import Attachments

# Process a local file or a URL
ctx = Attachments("path/to/file.pdf") 
# or 
ctx = Attachments("https://example.com/document.docx")

llm_ready_text   = str(ctx)       # All extracted text, "prompt-engineered"
llm_ready_images = ctx.images     # list[str] – base64 PNGs

print(f"Extracted text length: {len(llm_ready_text)}")
print(f"Number of images: {len(llm_ready_images)}")


**LLM Integration**

`attachments` provides direct integration with popular LLM APIs like OpenAI and Anthropic, formatting the content appropriately:

python
from openai import OpenAI
from attachments import Attachments

# Process a PowerPoint presentation, selecting slides 3 to 5
pptx = Attachments("https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]")

client = OpenAI()
resp = client.chat.completions.create(
    model="gpt-4.1-nano", # Use an appropriate vision-capable model
    messages=pptx.openai_chat("Analyze the following document:")
)
print(resp.choices[0].message.content)


**Advanced DSL Usage**

The DSL allows for precise control over content extraction and transformation:

*   Select pages or slides: `report.pdf[1,3-5,-1]`
*   Web content selection: `url[select:title]`
*   Image transformations: `photo.jpg[rotate:90][crop:100,100,400,300]`
*   Repository analysis: `./my-project[ignore:standard][max_files:100]`

## Why Use Attachments?

`attachments` stands out for several key reasons:

*   **Comprehensive Multimodal Support**: It handles a wide array of formats out of the box, including PDFs, PowerPoint, Word, Excel, CSV, TXT, Markdown, HTML, various image types, ZIP archives, and even Git repositories.
*   **Simplified LLM Context**: It abstracts away the complexity of parsing and formatting diverse data for LLMs, providing a unified `text` and `images` output.
*   **Powerful DSL**: The intuitive Domain Specific Language allows for granular control over content extraction, filtering, and transformation, enabling complex workflows with concise syntax.
*   **Extensibility**: The modular pipeline architecture allows users to easily extend its capabilities by adding custom loaders, modifiers, presenters, refiners, and adapters.
*   **Direct LLM API Integration**: It provides helper methods to format content directly for OpenAI, Anthropic, and DSPy, saving development time.
*   **Advanced Features**: Capabilities like CSS selector highlighting for web scraping and dedicated Microsoft Office support enhance its utility for specialized tasks.

## Links

Explore the `attachments` project further on GitHub:

*   **GitHub Repository**: [MaximeRivest/attachments](https://github.com/MaximeRivest/attachments){target="_blank"}