gptpdf: Effortlessly Parse PDFs into Markdown with GPT-4o

Introduction

gptpdf is an innovative Python library designed to effortlessly parse PDF documents and convert them into structured Markdown format. Leveraging the power of large visual models like GPT-4o, this tool achieves near-perfect parsing of complex PDF elements, including typography, mathematical formulas, tables, pictures, and charts. Despite its advanced capabilities, gptpdf boasts a remarkably concise codebase, with only 293 lines of code, making it both efficient and easy to understand.

The project utilizes the GeneralAgent library for interacting with OpenAI API and offers a cost-effective solution, averaging just $0.013 per page. For users preferring a visual interface, pdfgpt-ui provides a convenient web-based tool built on gptpdf.

Installation

Getting started with gptpdf is straightforward. You can install it directly via pip:

pip install gptpdf

Examples

gptpdf is designed for ease of use, whether you're running it locally or in a cloud environment like Google Colab.

Local Usage

Here's a quick example of how to parse a PDF file using your OpenAI API key:

from gptpdf import parse_pdf
api_key = 'Your OpenAI API Key'
content, image_paths = parse_pdf(pdf_path, api_key=api_key)
print(content)

For more detailed examples, you can refer to the test/test.py file in the repository.

Google Colab

A dedicated notebook is available for those who prefer to use gptpdf in Google Colab, providing a quick tour and setup instructions: examples/gptpdf_Quick_Tour.ipynb.

You can also explore the output of parsed PDFs, such as:

Attention Is All You Need from attention_is_all_you_need.pdf.
Another example from rh.pdf.

Why Use gptpdf?

gptpdf stands out as a robust solution for PDF to Markdown conversion due to several key advantages:

High Accuracy: It excels at preserving the original layout and content, including complex elements like mathematical equations, tables, and embedded images, by intelligently identifying and processing non-text areas.
Cost-Effective: With an average cost of $0.013 per page, it offers an economical way to process large volumes of documents.
Versatile Model Support: While optimized for GPT-4o, gptpdf supports a wide range of multimodal large models, including Qwen-VL-Max, GLM-4V, Yi-Vision, and Azure OpenAI, allowing users to choose the best model for their needs.
Customizable Prompts: Advanced users can define custom prompts to fine-tune the model's behavior for specific parsing requirements, ensuring optimal results.
Simplicity and Efficiency: The core logic is contained within a small, manageable codebase, making it easy to integrate and maintain.
Community and Contribution: The project encourages community involvement, offering a WeChat group for support and contributions.

gptpdf: Effortlessly Parse PDFs into Markdown with GPT-4o

Summary

Repository Info

Tags