OS
OSRepos
HomeRepositoriesRSS

Repository History

Explore all analyzed open source repositories

Topic: document-processing
pdfplumber: Extracting Data from PDFs with Ease and Precision

pdfplumber: Extracting Data from PDFs with Ease and Precision

pdfplumber is a powerful Python library designed to extract detailed information from PDFs, including characters, rectangles, and lines. It excels at easily extracting text and tables, making it an invaluable tool for data analysis and automation. Built on pdfminer.six, it provides robust PDF parsing capabilities.

Jan 24, 2026
View Details
E2M: Convert Various File Types to Markdown for RAG and LLM Training

E2M: Convert Various File Types to Markdown for RAG and LLM Training

E2M is a Python library designed to convert diverse file types, including documents, web pages, and audio, into Markdown format. It features a robust parser-converter architecture, making it highly flexible and easy to integrate. This tool is specifically aimed at generating high-quality data for Retrieval-Augmented Generation (RAG) and large language model training.

Dec 24, 2025
View Details
sumy: Automatic Text Summarization for Documents and HTML Pages

sumy: Automatic Text Summarization for Documents and HTML Pages

sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.

Dec 14, 2025
View Details
Page 1
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

Navigation

HomeRepositoriesSitemapRSS Feed

Legal

Privacy PolicyCookie Policy

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️