Repository History

Explore all analyzed open source repositories

Topic: ocr
ImageToolbox: Advanced Image Manipulation and Editing for Android

ImageToolbox: Advanced Image Manipulation and Editing for Android

ImageToolbox is a powerful Android application offering extensive image manipulation capabilities. It provides a wide array of features, from basic tools like cropping and drawing to advanced options such as AI-powered enhancements, OCR, and a vast collection of filters. Built with Kotlin and Jetpack Compose, it delivers a modern and efficient user experience for both casual users and professionals.

Mar 18, 2026
View Details
PaddleOCR: A Powerful OCR Toolkit for Structured Document Data

PaddleOCR: A Powerful OCR Toolkit for Structured Document Data

PaddleOCR is an industry-leading, production-ready OCR and document AI engine that transforms any PDF or image document into structured, AI-friendly data. It offers end-to-end solutions from text extraction to intelligent document understanding, supporting over 100 languages with high accuracy and efficiency.

Mar 14, 2026
View Details
PDF Craft: Convert Scanned PDF Books to Markdown and EPUB

PDF Craft: Convert Scanned PDF Books to Markdown and EPUB

PDF Craft is a Python library designed to convert PDF files, especially scanned books, into various formats like Markdown and EPUB. Leveraging DeepSeek OCR, it accurately extracts text, tables, and formulas while preserving document structure. The project offers a fast, local conversion process, making it ideal for digitizing complex documents.

Mar 9, 2026
View Details
Unstructured: Open-Source Pre-Processing for Complex Document Data

Unstructured: Open-Source Pre-Processing for Complex Document Data

The `unstructured` library is an open-source ETL solution designed to convert complex, unstructured documents into clean, structured data. It streamlines the data processing workflow for language models, offering tools for ingesting and pre-processing various document types like PDFs, HTML, and Word documents. This library simplifies the transformation of raw information into formats suitable for advanced AI applications.

Feb 10, 2026
View Details
text-extract-api: Advanced Document Extraction, OCR, and PII Removal with LLMs

text-extract-api: Advanced Document Extraction, OCR, and PII Removal with LLMs

text-extract-api is a powerful API designed for extracting and parsing text from various document formats, including PDF, Word, and PPTX. It utilizes modern OCRs and Ollama-supported LLMs for highly accurate text extraction, PII removal, and conversion to structured JSON or Markdown, all while maintaining data privacy through its self-hosted architecture.

Oct 12, 2025
View Details
Page 1