{"name":"Docling: Streamline Document Processing for Generative AI Applications","description":"Docling is a powerful Python library designed to simplify document processing, preparing diverse formats for generative AI applications. It offers advanced parsing capabilities, including sophisticated PDF understanding, and provides a unified document representation. With seamless integrations into the AI ecosystem, Docling empowers developers to build robust AI solutions.","github":"https://github.com/DS4SD/docling","url":"https://osrepos.com/repo/ds4sd-docling","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/ds4sd-docling","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/ds4sd-docling.md","json":"https://osrepos.com/repo/ds4sd-docling.json","topics":["ai","document-parsing","pdf-converter","python","generative-ai","nlp","data-extraction","document-automation"],"keywords":["ai","document-parsing","pdf-converter","python","generative-ai","nlp","data-extraction","document-automation"],"stars":null,"summary":"Docling is a powerful Python library designed to simplify document processing, preparing diverse formats for generative AI applications. It offers advanced parsing capabilities, including sophisticated PDF understanding, and provides a unified document representation. With seamless integrations into the AI ecosystem, Docling empowers developers to build robust AI solutions.","content":"## Introduction\n\nDocling is an open-source project aimed at simplifying document processing, making documents ready for generative AI applications. It excels at parsing diverse document formats, including advanced PDF understanding, and offers seamless integrations with the generative AI ecosystem. With Docling, you can transform complex documents into structured, usable data for AI models.\n\n## Installation\n\nInstalling Docling is straightforward using pip:\n\nbash\npip install docling\n\n\n**Note:** Python 3.9 support was dropped in Docling version 2.70.0. Please use Python 3.10 or higher.\n\nDocling works on macOS, Linux, and Windows environments for both x86_64 and arm64 architectures. For more detailed installation instructions, refer to the [official documentation](https://docling-project.github.io/docling/getting_started/installation/){:target=\"_blank\"}.\n\n## Examples\n\n### Convert a Document (CLI)\n\nYou can convert a document directly from the command line:\n\nbash\ndocling https://arxiv.org/pdf/2206.01062\n\n\nThis generates a `.md` file in the current directory containing structured document content.\n\nYou can also use Visual Language Models (VLMs) like GraniteDocling via the CLI:\n\nbash\ndocling --pipeline vlm --vlm-model granite_docling https://arxiv.org/pdf/2206.01062\n\n\n### Python Usage (Recommended)\n\nFor programmatic integration, Python usage is recommended:\n\npython\nfrom docling.document_converter import DocumentConverter\n\nsource = \"https://arxiv.org/pdf/2408.09869\"  # a document via a local path or URL\nconverter = DocumentConverter()\nresult = converter.convert(source)\nprint(result.document.export_to_markdown())  # output: \"## Docling Technical Report[...]\"\n\n\nMore advanced [usage](https://docling-project.github.io/docling/usage/){:target=\"_blank\"} and [configuration](https://docling-project.github.io/docling/getting_started/installation/){:target=\"_blank\"} options are available.\n\n## Why Use Docling?\n\nDocling offers a robust set of features that make it an essential tool for document processing and AI integration:\n\n*   **Multi-Format Support:** Parses a wide range of formats, including PDF, DOCX, PPTX, XLSX, HTML, EPUB, email formats, images, and more.\n*   **Advanced PDF Understanding:** Goes beyond basic extraction, understanding page layout, reading order, table structure, code, formulas, and image classification.\n*   **Unified Representation:** Provides a unified, expressive `DoclingDocument` representation format for easy manipulation.\n*   **Plug-and-Play Integrations:** Seamlessly connects with popular AI frameworks like LangChain, LlamaIndex, Crew AI, and Haystack for agentic AI.\n*   **Local Execution:** Ensures data privacy and security with local execution capabilities, ideal for sensitive data and air-gapped environments.\n*   **Comprehensive OCR Support:** Includes extensive OCR support for scanned PDFs and images.\n*   **Flexible Services:** Can be run as a service with the API server (docling-serve) or connected to any agent using the MCP server.\n\n## Links\n\n*   **Docling GitHub Repository:** [https://github.com/docling-project/docling](https://github.com/docling-project/docling){:target=\"_blank\"}\n*   **Official Documentation:** [https://docling-project.github.io/docling/](https://docling-project.github.io/docling/){:target=\"_blank\"}\n*   **Docling on PyPI:** [https://pypi.org/project/docling/](https://pypi.org/project/docling/){:target=\"_blank\"}\n*   **Docling Technical Report:** [https://arxiv.org/abs/2408.09869](https://arxiv.org/abs/2408.09869){:target=\"_blank\"}\n*   **Examples:** [https://docling-project.github.io/docling/examples/](https://docling-project.github.io/docling/examples/){:target=\"_blank\"}\n*   **Integrations:** [https://docling-project.github.io/docling/integrations/](https://docling-project.github.io/docling/integrations/){:target=\"_blank\"}","metrics":{"detailViews":1,"githubClicks":2},"dates":{"published":null,"modified":"2026-07-03T12:49:05.000Z"}}