OS
OSRepos
HomeRepositoriesRSS

Repository History

Explore all analyzed open source repositories

Topic: pdf
PDF Craft: Convert Scanned PDF Books to Markdown and EPUB

PDF Craft: Convert Scanned PDF Books to Markdown and EPUB

PDF Craft is a Python library designed to convert PDF files, especially scanned books, into various formats like Markdown and EPUB. Leveraging DeepSeek OCR, it accurately extracts text, tables, and formulas while preserving document structure. The project offers a fast, local conversion process, making it ideal for digitizing complex documents.

Mar 9, 2026
View Details
pdfplumber: Extracting Data from PDFs with Ease and Precision

pdfplumber: Extracting Data from PDFs with Ease and Precision

pdfplumber is a powerful Python library designed to extract detailed information from PDFs, including characters, rectangles, and lines. It excels at easily extracting text and tables, making it an invaluable tool for data analysis and automation. Built on pdfminer.six, it provides robust PDF parsing capabilities.

Jan 24, 2026
View Details
text-extract-api: Advanced Document Extraction, OCR, and PII Removal with LLMs

text-extract-api: Advanced Document Extraction, OCR, and PII Removal with LLMs

text-extract-api is a powerful API designed for extracting and parsing text from various document formats, including PDF, Word, and PPTX. It utilizes modern OCRs and Ollama-supported LLMs for highly accurate text extraction, PII removal, and conversion to structured JSON or Markdown, all while maintaining data privacy through its self-hosted architecture.

Oct 12, 2025
View Details
Page 1
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

Navigation

HomeRepositoriesSitemapRSS Feed

Legal

Privacy PolicyCookie Policy

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️