Repository History
Explore all analyzed open source repositories

E2M: Convert Various File Types to Markdown for RAG and LLM Training
E2M is a Python library designed to convert diverse file types, including documents, web pages, and audio, into Markdown format. It features a robust parser-converter architecture, making it highly flexible and easy to integrate. This tool is specifically aimed at generating high-quality data for Retrieval-Augmented Generation (RAG) and large language model training.

sumy: Automatic Text Summarization for Documents and HTML Pages
sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.