{"name":"Docling: Streamlining Document Processing for Generative AI","description":"Docling is a powerful Python library designed to simplify document processing and prepare diverse formats for generative AI applications. It excels at parsing various document types, including advanced PDF understanding, and offers seamless integrations with popular AI frameworks. With Docling, developers can efficiently extract, transform, and utilize document content for their AI models.","github":"https://github.com/docling-project/docling","url":"https://osrepos.com/repo/docling-project-docling","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/docling-project-docling","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/docling-project-docling.md","json":"https://osrepos.com/repo/docling-project-docling.json","topics":["Python","AI","Document Parsing","PDF Processing","Generative AI","Data Extraction","Markdown Conversion","LangChain Integration"],"keywords":["Python","AI","Document Parsing","PDF Processing","Generative AI","Data Extraction","Markdown Conversion","LangChain Integration"],"stars":null,"summary":"Docling is a powerful Python library designed to simplify document processing and prepare diverse formats for generative AI applications. It excels at parsing various document types, including advanced PDF understanding, and offers seamless integrations with popular AI frameworks. With Docling, developers can efficiently extract, transform, and utilize document content for their AI models.","content":"## Introduction\nDocling is an open-source Python library from the `docling-project` that revolutionizes how documents are prepared for generative AI. It provides robust capabilities for parsing and understanding a wide array of document formats, from standard PDFs and Office files to HTML, Markdown, and even audio. Docling aims to simplify the complex task of extracting structured information from unstructured and semi-structured documents, making it readily consumable by AI models and applications. Its advanced features include sophisticated PDF layout analysis, table structure recognition, and support for various export formats, ensuring data integrity and usability.\n\n## Installation\nGetting started with Docling is straightforward. You can install it using pip:\n\nbash\npip install docling\n\n\nPlease note that Docling requires Python 3.10 or higher. It is compatible with macOS, Linux, and Windows environments, supporting both x86_64 and arm64 architectures. For more detailed instructions, refer to the [official documentation](https://docling-project.github.io/docling/installation/).\n\n## Examples\nDocling offers both a Python API and a convenient command-line interface (CLI) for document conversion.\n\n**Python API Example:**\nTo convert individual documents programmatically:\n\npython\nfrom docling.document_converter import DocumentConverter\n\nsource = \"https://arxiv.org/pdf/2408.09869\"  # document per local path or URL\nconverter = DocumentConverter()\nresult = converter.convert(source)\nprint(result.document.export_to_markdown())  # output: \"## Docling Technical Report[...]\"\n\n\n**CLI Example:**\nYou can also convert documents directly from your terminal:\n\nbash\ndocling https://arxiv.org/pdf/2206.01062\n\n\nDocling CLI also supports Visual Language Models (VLMs) like GraniteDocling:\n\nbash\ndocling --pipeline vlm --vlm-model granite_docling https://arxiv.org/pdf/2206.01062\n\n\nExplore more usage examples and advanced options in the [documentation](https://docling-project.github.io/docling/usage/).\n\n## Why Use Docling\nDocling stands out for its comprehensive approach to document processing for AI. Its ability to parse a multitude of formats, including advanced PDF understanding with layout, reading order, and table structure, makes it incredibly versatile. The unified `DoclingDocument` representation simplifies data handling, while various export options, including Markdown and lossless JSON, provide flexibility. Furthermore, Docling offers plug-and-play integrations with popular AI frameworks like LangChain, LlamaIndex, and Haystack, accelerating agentic AI development. Its local execution capabilities ensure data privacy and security, making it suitable for sensitive environments.\n\n## Links\n*   [GitHub Repository](https://github.com/docling-project/docling)\n*   [Official Documentation](https://docling-project.github.io/docling/)\n*   [PyPI Package](https://pypi.org/project/docling/)\n*   [Docling Technical Report (arXiv)](https://arxiv.org/abs/2408.09869)\n*   [Discord Community](https://docling.ai/discord)","metrics":{"detailViews":3,"githubClicks":3},"dates":{"published":null,"modified":"2026-03-22T20:59:33.000Z"}}