PaddleOCR: A Powerful OCR Toolkit for Structured Document Data

Introduction

PaddleOCR is an industry-leading, production-ready Optical Character Recognition (OCR) and document AI engine developed by PaddlePaddle. It provides comprehensive, end-to-end solutions, transforming any PDF or image document into structured, AI-friendly data like JSON and Markdown. With support for over 100 languages, PaddleOCR bridges the gap between raw visual documents and advanced Large Language Models (LLMs), making it a powerful and lightweight toolkit for various AI applications. The project boasts over 72,000 stars on GitHub, highlighting its widespread adoption and impact in the AI community. Recent advancements include PaddleOCR-VL-1.5 for real-world document parsing and text spotting, and PP-OCRv5 for universal scene text recognition.

Installation

To get started with PaddleOCR, you first need to install PaddlePaddle. Refer to the PaddlePaddle Installation Guide for detailed instructions. Once PaddlePaddle is installed, you can install the PaddleOCR toolkit using pip:

# If you only want to use the basic text recognition feature (returns text position coordinates and content), including the PP-OCR series
python -m pip install paddleocr

For full functionality, including document parsing, understanding, and translation, you can install with the [all] dependency group:

# If you want to use all features such as document parsing, document understanding, document translation, key information extraction, etc.
python -m pip install "paddleocr[all]"

PaddleOCR also supports installing partial optional features by specifying other dependency groups like doc-parser for document parsing, ie for information extraction, and trans for document translation.

Examples

PaddleOCR offers both command-line interface (CLI) and API for inference.

CLI Examples:

# Run PP-OCRv5 inference
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False  

# Run PP-StructureV3 inference
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png --use_doc_orientation_classify False --use_doc_unwarping False

# Run PaddleOCR-VL inference
paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png

API Example (PP-OCRv5):

# Initialize PaddleOCR instance
from paddleocr import PaddleOCR
ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False)

# Run OCR inference on a sample image 
result = ocr.predict(
    input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")

# Visualize the results and save the JSON results
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

Why Use PaddleOCR

PaddleOCR stands out as a premier solution for intelligent document applications in the AI era due to several compelling reasons:

Industry-Leading Accuracy: It consistently achieves state-of-the-art performance in various OCR and document parsing benchmarks, including complex real-world scenarios.
Multilingual Support: With robust support for over 100 languages, it caters to global applications and diverse linguistic needs.
Comprehensive Functionality: Beyond basic text recognition, it offers advanced features like document parsing (PP-StructureV3), intelligent information extraction (PP-ChatOCRv4), and document translation (PP-DocTranslation).
Production-Ready and Efficient: Designed for practical deployment, PaddleOCR is lightweight, resource-efficient, and supports high-performance inference across various hardware, including CPU, GPU, XPU, and NPU.
Strong Community and Integrations: Integrated into leading projects like MinerU, RAGFlow, and pathway, it benefits from an active community and extensive documentation.