Repository History

7 repositories tagged with OCR

Topic: OCR

GLM-OCR: Accurate, Fast, and Comprehensive Multimodal OCR Model

GLM-OCR is a powerful multimodal OCR model designed for complex document understanding, built on the GLM-V encoder-decoder architecture. It achieves state-of-the-art performance across various benchmarks, offering efficient inference and easy integration. This open-source solution is optimized for real-world business scenarios, providing robust and high-quality OCR capabilities.

Analyzed May 28, 2026

View Details

OmniParse: Ingest, Parse, and Optimize Data for GenAI Frameworks

OmniParse is a powerful platform designed to ingest, parse, and optimize any unstructured data, from documents to multimedia, into structured, actionable formats. It enhances compatibility with GenAI frameworks, preparing data for applications like RAG and fine-tuning. This tool simplifies the complex process of data preparation for AI, making it accessible and efficient.

Analyzed Apr 7, 2026

View Details

docling-api: Scalable Document to Markdown Conversion Server

docling-api is a robust and scalable backend server designed for converting a wide array of document formats, including PDFs, DOCX, and images, into Markdown. Built with FastAPI, Celery, and Redis, it supports both CPU and GPU processing, making it ideal for large-scale workflows requiring efficient text, table, and image extraction, along with OCR capabilities. This service offers flexible synchronous and asynchronous API endpoints for single and batch document conversions.

Analyzed Jan 30, 2026

View Details

Papermerge DMS: Open Source Document Management for Digital Archives

Papermerge DMS is an open-source document management system specifically designed for scanned documents and digital archives. It leverages OCR technology to extract and index text, enabling full-text search and efficient organization. With a modern web UI, it provides a desktop-like experience for managing various document formats.

Analyzed Jan 8, 2026

View Details

Kreuzberg: A Polyglot Document Intelligence Framework with a Rust Core

Kreuzberg is a powerful polyglot document intelligence framework built with a high-performance Rust core. It enables extraction of text, metadata, and structured information from over 50 file formats, including PDFs, Office documents, and images. Developers can leverage Kreuzberg across multiple languages like Rust, Python, Ruby, Go, and Node.js, or utilize it via CLI, REST API, or MCP server.

Analyzed Dec 30, 2025

View Details

Marker: High-Accuracy Document Conversion to Markdown and JSON

Marker is an open-source Python tool designed for high-accuracy conversion of documents like PDFs, images, and office files into Markdown, JSON, and HTML. It excels at preserving complex formatting, extracting images, and can leverage LLMs for even greater precision. This makes Marker a powerful solution for structured document intelligence.

Analyzed Nov 9, 2025

View Details

Ollama-OCR: Advanced OCR with Vision Language Models via Ollama

Ollama-OCR is a robust Python package and Streamlit application for Optical Character Recognition. It leverages state-of-the-art vision language models, accessible through Ollama, to accurately extract text from both images and PDF documents. The tool offers extensive features including support for multiple models, various output formats, and batch processing capabilities.

Analyzed Oct 12, 2025

View Details

Previous Page 1 Next