{"name":"Attachments: The Python Funnel for LLM Context and Multimodal Data","description":"Attachments simplifies providing context to Large Language Models by transforming various file types into model-ready text and images. This Python library acts as a universal funnel, enabling developers to integrate diverse data sources like PDFs, images, web content, and even entire code repositories with just a few lines of code. It supports popular LLM APIs and frameworks, making multimodal AI development more accessible.","github":"https://github.com/MaximeRivest/attachments","url":"https://osrepos.com/repo/maximerivest-attachments","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/maximerivest-attachments","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/maximerivest-attachments.md","json":"https://osrepos.com/repo/maximerivest-attachments.json","topics":["Python","LLM","Multimodal AI","Data Processing","File Conversion","Web Scraping","AI Tools","Developer Tools"],"keywords":["Python","LLM","Multimodal AI","Data Processing","File Conversion","Web Scraping","AI Tools","Developer Tools"],"stars":null,"summary":"Attachments simplifies providing context to Large Language Models by transforming various file types into model-ready text and images. This Python library acts as a universal funnel, enabling developers to integrate diverse data sources like PDFs, images, web content, and even entire code repositories with just a few lines of code. It supports popular LLM APIs and frameworks, making multimodal AI development more accessible.","content":"## Introduction\n\n`attachments` is a powerful Python library designed to simplify providing context to Large Language Models (LLMs). It acts as a universal funnel, transforming various file types, including PDFs, images, web pages, and even entire code repositories, into model-ready text and base64 encoded images. With `attachments`, developers can integrate diverse data sources into their LLM applications with just a few lines of code, streamlining multimodal AI development.\n\n## Installation\n\nGetting started with `attachments` is straightforward. You can install it using pip:\n\nbash\npip install attachments\n\n\nFor advanced features like CSS selector highlighting or Microsoft Office support, specific extras can be installed:\n\nbash\n# For CSS selector highlighting (requires Playwright)\npip install attachments[browser]\nplaywright install chromium\n\n# For Microsoft Office format support\npip install attachments[office]\n\n\n## Examples\n\n`attachments` offers a simple API for common use cases and a powerful Domain Specific Language (DSL) for advanced processing.\n\n**Quick Start**\n\nTransform any file into LLM-ready content:\n\npython\nfrom attachments import Attachments\n\n# Process a local file or a URL\nctx = Attachments(\"path/to/file.pdf\") \n# or \nctx = Attachments(\"https://example.com/document.docx\")\n\nllm_ready_text   = str(ctx)       # All extracted text, \"prompt-engineered\"\nllm_ready_images = ctx.images     # list[str] – base64 PNGs\n\nprint(f\"Extracted text length: {len(llm_ready_text)}\")\nprint(f\"Number of images: {len(llm_ready_images)}\")\n\n\n**LLM Integration**\n\n`attachments` provides direct integration with popular LLM APIs like OpenAI and Anthropic, formatting the content appropriately:\n\npython\nfrom openai import OpenAI\nfrom attachments import Attachments\n\n# Process a PowerPoint presentation, selecting slides 3 to 5\npptx = Attachments(\"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]\")\n\nclient = OpenAI()\nresp = client.chat.completions.create(\n    model=\"gpt-4.1-nano\", # Use an appropriate vision-capable model\n    messages=pptx.openai_chat(\"Analyze the following document:\")\n)\nprint(resp.choices[0].message.content)\n\n\n**Advanced DSL Usage**\n\nThe DSL allows for precise control over content extraction and transformation:\n\n*   Select pages or slides: `report.pdf[1,3-5,-1]`\n*   Web content selection: `url[select:title]`\n*   Image transformations: `photo.jpg[rotate:90][crop:100,100,400,300]`\n*   Repository analysis: `./my-project[ignore:standard][max_files:100]`\n\n## Why Use Attachments?\n\n`attachments` stands out for several key reasons:\n\n*   **Comprehensive Multimodal Support**: It handles a wide array of formats out of the box, including PDFs, PowerPoint, Word, Excel, CSV, TXT, Markdown, HTML, various image types, ZIP archives, and even Git repositories.\n*   **Simplified LLM Context**: It abstracts away the complexity of parsing and formatting diverse data for LLMs, providing a unified `text` and `images` output.\n*   **Powerful DSL**: The intuitive Domain Specific Language allows for granular control over content extraction, filtering, and transformation, enabling complex workflows with concise syntax.\n*   **Extensibility**: The modular pipeline architecture allows users to easily extend its capabilities by adding custom loaders, modifiers, presenters, refiners, and adapters.\n*   **Direct LLM API Integration**: It provides helper methods to format content directly for OpenAI, Anthropic, and DSPy, saving development time.\n*   **Advanced Features**: Capabilities like CSS selector highlighting for web scraping and dedicated Microsoft Office support enhance its utility for specialized tasks.\n\n## Links\n\nExplore the `attachments` project further on GitHub:\n\n*   **GitHub Repository**: [MaximeRivest/attachments](https://github.com/MaximeRivest/attachments){target=\"_blank\"}","metrics":{"detailViews":5,"githubClicks":4},"dates":{"published":null,"modified":"2025-11-24T08:01:14.000Z"}}