Repository History
2 repositories tagged with llm-data
Topic: llm-data

E2M: Convert Various File Types to Markdown for RAG and LLM Training
E2M is a Python library designed to convert diverse file types, including documents, web pages, and audio, into Markdown format. It features a robust parser-converter architecture, making it highly flexible and easy to integrate. This tool is specifically aimed at generating high-quality data for Retrieval-Augmented Generation (RAG) and large language model training.
Analyzed Dec 24, 2025
View Details

AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data
AnyCrawl is a powerful Node.js/TypeScript web crawler designed to transform websites into LLM-ready data. It excels at extracting structured SERP results from various search engines and features native multi-threading for efficient bulk processing, making it ideal for large-scale data collection.
Analyzed Oct 12, 2025
View Details
Previous Page 1 Next