OS
OSRepos
HomeRepositoriesRSS

Repository History

Explore all analyzed open source repositories

Topic: text-extraction
E2M: Convert Various File Types to Markdown for RAG and LLM Training

E2M: Convert Various File Types to Markdown for RAG and LLM Training

E2M is a Python library designed to convert diverse file types, including documents, web pages, and audio, into Markdown format. It features a robust parser-converter architecture, making it highly flexible and easy to integrate. This tool is specifically aimed at generating high-quality data for Retrieval-Augmented Generation (RAG) and large language model training.

Dec 24, 2025
View Details
sumy: Automatic Text Summarization for Documents and HTML Pages

sumy: Automatic Text Summarization for Documents and HTML Pages

sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.

Dec 14, 2025
View Details
Page 1
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

Navigation

HomeRepositoriesSitemapRSS Feed

Legal

Privacy PolicyCookie Policy

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️