gptpdf: Effortlessly Parse PDFs into Markdown with GPT-4o

gptpdf: Effortlessly Parse PDFs into Markdown with GPT-4o

Summary

gptpdf is a powerful Python library that leverages large visual models like GPT-4o to accurately parse PDF documents into clean Markdown format. With just 293 lines of code, it excels at preserving typography, math formulas, tables, and images. This tool offers an efficient and cost-effective solution for converting complex PDFs.

Repository Info

Updated on October 24, 2025
View on GitHub

Introduction

gptpdf is an innovative Python library designed to effortlessly parse PDF documents and convert them into structured Markdown format. Leveraging the power of large visual models like GPT-4o, this tool achieves near-perfect parsing of complex PDF elements, including typography, mathematical formulas, tables, pictures, and charts. Despite its advanced capabilities, gptpdf boasts a remarkably concise codebase, with only 293 lines of code, making it both efficient and easy to understand.

The project utilizes the GeneralAgent library for interacting with OpenAI API and offers a cost-effective solution, averaging just $0.013 per page. For users preferring a visual interface, pdfgpt-ui provides a convenient web-based tool built on gptpdf.

Installation

Getting started with gptpdf is straightforward. You can install it directly via pip:

pip install gptpdf

Examples

gptpdf is designed for ease of use, whether you're running it locally or in a cloud environment like Google Colab.

Local Usage

Here's a quick example of how to parse a PDF file using your OpenAI API key:

from gptpdf import parse_pdf
api_key = 'Your OpenAI API Key'
content, image_paths = parse_pdf(pdf_path, api_key=api_key)
print(content)

For more detailed examples, you can refer to the test/test.py file in the repository.

Google Colab

A dedicated notebook is available for those who prefer to use gptpdf in Google Colab, providing a quick tour and setup instructions: examples/gptpdf_Quick_Tour.ipynb.

You can also explore the output of parsed PDFs, such as:

Why Use gptpdf?

gptpdf stands out as a robust solution for PDF to Markdown conversion due to several key advantages:

  • High Accuracy: It excels at preserving the original layout and content, including complex elements like mathematical equations, tables, and embedded images, by intelligently identifying and processing non-text areas.
  • Cost-Effective: With an average cost of $0.013 per page, it offers an economical way to process large volumes of documents.
  • Versatile Model Support: While optimized for GPT-4o, gptpdf supports a wide range of multimodal large models, including Qwen-VL-Max, GLM-4V, Yi-Vision, and Azure OpenAI, allowing users to choose the best model for their needs.
  • Customizable Prompts: Advanced users can define custom prompts to fine-tune the model's behavior for specific parsing requirements, ensuring optimal results.
  • Simplicity and Efficiency: The core logic is contained within a small, manageable codebase, making it easy to integrate and maintain.
  • Community and Contribution: The project encourages community involvement, offering a WeChat group for support and contributions.

Links