# PDF Craft: Convert Scanned PDF Books to Markdown and EPUB

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/oomol-lab-pdf-craft
Generated for open source discovery and AI-assisted research.

PDF Craft is a Python library designed to convert PDF files, especially scanned books, into various formats like Markdown and EPUB. Leveraging DeepSeek OCR, it accurately extracts text, tables, and formulas while preserving document structure. The project offers a fast, local conversion process, making it ideal for digitizing complex documents.

GitHub: https://github.com/oomol-lab/pdf-craft
OSRepos URL: https://osrepos.com/repo/oomol-lab-pdf-craft

## Summary

PDF Craft is a Python library designed to convert PDF files, especially scanned books, into various formats like Markdown and EPUB. Leveraging DeepSeek OCR, it accurately extracts text, tables, and formulas while preserving document structure. The project offers a fast, local conversion process, making it ideal for digitizing complex documents.

## Topics

- deepseek-ocr
- document
- ocr
- pdf
- Python
- conversion
- ebook

## Repository Information

Last analyzed by OSRepos: Mon Mar 09 2026 09:00:00 GMT+0000 (Western European Standard Time)
Detail views: 3
GitHub clicks: 5

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction
PDF Craft is a powerful Python library that specializes in converting PDF files into other formats, with a particular focus on scanned books. It utilizes DeepSeek OCR for robust document recognition, capable of handling complex content such as tables and formulas. This tool ensures that the converted Markdown or EPUB files maintain the integrity and readability of the original document, including proper handling of footnotes, images, and automatic table of contents generation.

## Installation
To get started with PDF Craft, you can install it using pip. Note that you will also need to install Poppler for PDF parsing and configure a CUDA environment for OCR recognition for actual conversion.

bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install pdf-craft


For detailed instructions on installing Poppler and configuring CUDA, please refer to the official [Installation Guide](https://github.com/oomol-lab/pdf-craft/blob/main/docs/INSTALLATION.md){:target="_blank"}.

## Examples
PDF Craft provides straightforward APIs for converting PDFs to Markdown or EPUB.

### Convert to Markdown
python
from pdf_craft import transform_markdown

transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    markdown_assets_path="images",
)


### Convert to EPUB
python
from pdf_craft import transform_epub, BookMeta

transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    book_meta=BookMeta(
        title="Book Title",
        authors=["Author"],
    ),
)


## Why Use PDF Craft
PDF Craft stands out for its lightweight and fast performance. By fully embracing DeepSeek OCR and operating locally, it eliminates network requests and long waiting times, ensuring efficient conversions. It excels at accurately identifying document structure, extracting body text, and filtering out interfering elements like headers and footers, making it highly effective for academic or technical documents. An [online demo platform](https://pdf.oomol.com/){:target="_blank"} is also available to experience its capabilities without any installation.

## Links
Explore PDF Craft further through these resources:
*   [GitHub Repository](https://github.com/oomol-lab/pdf-craft){:target="_blank"}
*   [Online Demo](https://pdf.oomol.com/){:target="_blank"}
*   [PyPI Package](https://pypi.org/project/pdf-craft/){:target="_blank"}