Attachments: The Python Funnel for LLM Context and Multimodal Data

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Attachments: The Python Funnel for LLM Context and Multimodal Data

Summary

Attachments simplifies providing context to Large Language Models by transforming various file types into model-ready text and images. This Python library acts as a universal funnel, enabling developers to integrate diverse data sources like PDFs, images, web content, and even entire code repositories with just a few lines of code. It supports popular LLM APIs and frameworks, making multimodal AI development more accessible.

Repository Information

Analyzed by OSRepos on November 24, 2025

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

attachments is a powerful Python library designed to simplify providing context to Large Language Models (LLMs). It acts as a universal funnel, transforming various file types, including PDFs, images, web pages, and even entire code repositories, into model-ready text and base64 encoded images. With attachments, developers can integrate diverse data sources into their LLM applications with just a few lines of code, streamlining multimodal AI development.

Installation

Getting started with attachments is straightforward. You can install it using pip:

pip install attachments

For advanced features like CSS selector highlighting or Microsoft Office support, specific extras can be installed:

# For CSS selector highlighting (requires Playwright)
pip install attachments[browser]
playwright install chromium

# For Microsoft Office format support
pip install attachments[office]

Examples

attachments offers a simple API for common use cases and a powerful Domain Specific Language (DSL) for advanced processing.

Quick Start

Transform any file into LLM-ready content:

from attachments import Attachments

# Process a local file or a URL
ctx = Attachments("path/to/file.pdf") 
# or 
ctx = Attachments("https://example.com/document.docx")

llm_ready_text   = str(ctx)       # All extracted text, "prompt-engineered"
llm_ready_images = ctx.images     # list[str] – base64 PNGs

print(f"Extracted text length: {len(llm_ready_text)}")
print(f"Number of images: {len(llm_ready_images)}")

LLM Integration

attachments provides direct integration with popular LLM APIs like OpenAI and Anthropic, formatting the content appropriately:

from openai import OpenAI
from attachments import Attachments

# Process a PowerPoint presentation, selecting slides 3 to 5
pptx = Attachments("https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]")

client = OpenAI()
resp = client.chat.completions.create(
    model="gpt-4.1-nano", # Use an appropriate vision-capable model
    messages=pptx.openai_chat("Analyze the following document:")
)
print(resp.choices[0].message.content)

Advanced DSL Usage

The DSL allows for precise control over content extraction and transformation:

  • Select pages or slides: report.pdf[1,3-5,-1]
  • Web content selection: url[select:title]
  • Image transformations: photo.jpg[rotate:90][crop:100,100,400,300]
  • Repository analysis: ./my-project[ignore:standard][max_files:100]

Why Use Attachments?

attachments stands out for several key reasons:

  • Comprehensive Multimodal Support: It handles a wide array of formats out of the box, including PDFs, PowerPoint, Word, Excel, CSV, TXT, Markdown, HTML, various image types, ZIP archives, and even Git repositories.
  • Simplified LLM Context: It abstracts away the complexity of parsing and formatting diverse data for LLMs, providing a unified text and images output.
  • Powerful DSL: The intuitive Domain Specific Language allows for granular control over content extraction, filtering, and transformation, enabling complex workflows with concise syntax.
  • Extensibility: The modular pipeline architecture allows users to easily extend its capabilities by adding custom loaders, modifiers, presenters, refiners, and adapters.
  • Direct LLM API Integration: It provides helper methods to format content directly for OpenAI, Anthropic, and DSPy, saving development time.
  • Advanced Features: Capabilities like CSS selector highlighting for web scraping and dedicated Microsoft Office support enhance its utility for specialized tasks.

Links

Explore the attachments project further on GitHub:

Related repositories

Similar repositories that may be relevant next.

PromptBench: A Unified Framework for LLM Evaluation and Robustness

PromptBench: A Unified Framework for LLM Evaluation and Robustness

July 1, 2026

PromptBench is a comprehensive Python library designed for the evaluation and understanding of Large Language Models (LLMs). It provides a unified framework for assessing model performance, exploring various prompt engineering techniques, and evaluating robustness against adversarial attacks. This tool empowers researchers to conduct in-depth analyses of LLMs across diverse datasets and models.

large-language-modelsLLM Evaluationprompt-engineering
LangTest: A Comprehensive Library for Safe & Effective Language Models

LangTest: A Comprehensive Library for Safe & Effective Language Models

June 30, 2026

LangTest is an open-source Python library dedicated to ensuring the safety and effectiveness of language models. It offers a comprehensive framework for testing model quality, covering robustness, bias, fairness, and accuracy across various NLP tasks and LLM providers. With LangTest, developers can generate and execute over 60 distinct test types with just one line of code, promoting responsible AI development.

ai-safetyai-testinglarge-language-models
EvalPlus: Rigorous Evaluation for LLM-Synthesized Code

EvalPlus: Rigorous Evaluation for LLM-Synthesized Code

June 30, 2026

EvalPlus is a robust framework designed for the rigorous evaluation of code generated by Large Language Models (LLMs). It extends standard benchmarks like HumanEval and MBPP with significantly more tests, offering precise assessment of code correctness and efficiency. This tool is crucial for developers and researchers aiming to thoroughly validate LLM-synthesized code.

benchmarklarge-language-modelsprogram-synthesis
AgentEvals: Robust Evaluation Tools for LLM Agent Trajectories

AgentEvals: Robust Evaluation Tools for LLM Agent Trajectories

June 30, 2026

AgentEvals is a powerful open-source package from LangChain designed to simplify the evaluation of agentic applications. It provides a collection of ready-made evaluators and utilities, with a particular focus on analyzing agent trajectories, the intermediate steps an agent takes to solve problems. This helps developers understand and improve the reliability and performance of their LLM agents.

PythonLLMAgents

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️