Docling: Streamline Document Processing for Generative AI Applications

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Docling: Streamline Document Processing for Generative AI Applications

Summary

Docling is a powerful Python library designed to simplify document processing, preparing diverse formats for generative AI applications. It offers advanced parsing capabilities, including sophisticated PDF understanding, and provides a unified document representation. With seamless integrations into the AI ecosystem, Docling empowers developers to build robust AI solutions.

Repository Information

Analyzed by OSRepos on July 3, 2026

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Docling is an open-source project aimed at simplifying document processing, making documents ready for generative AI applications. It excels at parsing diverse document formats, including advanced PDF understanding, and offers seamless integrations with the generative AI ecosystem. With Docling, you can transform complex documents into structured, usable data for AI models.

Installation

Installing Docling is straightforward using pip:

pip install docling

Note: Python 3.9 support was dropped in Docling version 2.70.0. Please use Python 3.10 or higher.

Docling works on macOS, Linux, and Windows environments for both x86_64 and arm64 architectures. For more detailed installation instructions, refer to the official documentation.

Examples

Convert a Document (CLI)

You can convert a document directly from the command line:

docling https://arxiv.org/pdf/2206.01062

This generates a .md file in the current directory containing structured document content.

You can also use Visual Language Models (VLMs) like GraniteDocling via the CLI:

docling --pipeline vlm --vlm-model granite_docling https://arxiv.org/pdf/2206.01062

Python Usage (Recommended)

For programmatic integration, Python usage is recommended:

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # a document via a local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"

More advanced usage and configuration options are available.

Why Use Docling?

Docling offers a robust set of features that make it an essential tool for document processing and AI integration:

  • Multi-Format Support: Parses a wide range of formats, including PDF, DOCX, PPTX, XLSX, HTML, EPUB, email formats, images, and more.
  • Advanced PDF Understanding: Goes beyond basic extraction, understanding page layout, reading order, table structure, code, formulas, and image classification.
  • Unified Representation: Provides a unified, expressive DoclingDocument representation format for easy manipulation.
  • Plug-and-Play Integrations: Seamlessly connects with popular AI frameworks like LangChain, LlamaIndex, Crew AI, and Haystack for agentic AI.
  • Local Execution: Ensures data privacy and security with local execution capabilities, ideal for sensitive data and air-gapped environments.
  • Comprehensive OCR Support: Includes extensive OCR support for scanned PDFs and images.
  • Flexible Services: Can be run as a service with the API server (docling-serve) or connected to any agent using the MCP server.

Links

Related repositories

Similar repositories that may be relevant next.

DeepFabric: High-Quality Synthetic Data for Agentic AI Systems

DeepFabric: High-Quality Synthetic Data for Agentic AI Systems

July 2, 2026

DeepFabric is an open-source Python library designed to generate high-quality synthetic training data for language models and agent evaluations. It excels at creating domain-specific datasets that teach models to think, plan, and act effectively, including correct tool usage and adherence to schema structures. This comprehensive pipeline also integrates training and evaluation capabilities, ensuring robust model development.

pythonaimachine-learning
OpenMontage: The First Open-Source, Agentic Video Production System

OpenMontage: The First Open-Source, Agentic Video Production System

June 29, 2026

OpenMontage is the world's first open-source, agentic video production system, designed to transform your AI coding assistant into a full video production studio. It features 12 pipelines, 52 tools, and over 500 agent skills, enabling end-to-end video creation from a simple prompt. This powerful tool handles research, scripting, asset generation, editing, and final composition, including the unique ability to produce real video from stock footage.

agentic-aivideo-productionopen-source
Guardrails: Enhancing LLM Reliability and Structured Data Generation

Guardrails: Enhancing LLM Reliability and Structured Data Generation

June 26, 2026

Guardrails is a Python framework designed to build reliable AI applications by adding guardrails to large language models. It helps detect, quantify, and mitigate risks in LLM inputs/outputs, and facilitates the generation of structured data. This framework ensures more predictable and safer interactions with AI models.

aifoundation-modelllm
OpenPencil: The AI-Native, Open-Source Figma Alternative Design Editor

OpenPencil: The AI-Native, Open-Source Figma Alternative Design Editor

June 21, 2026

OpenPencil is an innovative AI-native design editor, serving as a powerful open-source alternative to Figma. It supports .fig files, integrates AI for design creation, and provides a fully programmable toolkit with a headless Vue SDK. This project emphasizes real-time collaboration and local data control, making it a compelling choice for designers and developers seeking flexibility and ownership.

aidesign-editorfigma-alternative

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️