PDF Craft: Convert Scanned PDF Books to Markdown and EPUB

This repository profile is provided by osrepos.com, an open source repository discovery platform.

PDF Craft: Convert Scanned PDF Books to Markdown and EPUB

Summary

PDF Craft is a Python library designed to convert PDF files, especially scanned books, into various formats like Markdown and EPUB. Leveraging DeepSeek OCR, it accurately extracts text, tables, and formulas while preserving document structure. The project offers a fast, local conversion process, making it ideal for digitizing complex documents.

Repository Information

Analyzed by OSRepos on March 9, 2026

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

PDF Craft is a powerful Python library that specializes in converting PDF files into other formats, with a particular focus on scanned books. It utilizes DeepSeek OCR for robust document recognition, capable of handling complex content such as tables and formulas. This tool ensures that the converted Markdown or EPUB files maintain the integrity and readability of the original document, including proper handling of footnotes, images, and automatic table of contents generation.

Installation

To get started with PDF Craft, you can install it using pip. Note that you will also need to install Poppler for PDF parsing and configure a CUDA environment for OCR recognition for actual conversion.

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install pdf-craft

For detailed instructions on installing Poppler and configuring CUDA, please refer to the official Installation Guide.

Examples

PDF Craft provides straightforward APIs for converting PDFs to Markdown or EPUB.

Convert to Markdown

from pdf_craft import transform_markdown

transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    markdown_assets_path="images",
)

Convert to EPUB

from pdf_craft import transform_epub, BookMeta

transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    book_meta=BookMeta(
        title="Book Title",
        authors=["Author"],
    ),
)

Why Use PDF Craft

PDF Craft stands out for its lightweight and fast performance. By fully embracing DeepSeek OCR and operating locally, it eliminates network requests and long waiting times, ensuring efficient conversions. It excels at accurately identifying document structure, extracting body text, and filtering out interfering elements like headers and footers, making it highly effective for academic or technical documents. An online demo platform is also available to experience its capabilities without any installation.

Links

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️