Repository History

6 repositories tagged with Web Scraping

Topic: Web Scraping
Firecrawl: Web Scraping and Interaction API for AI Agents

Firecrawl: Web Scraping and Interaction API for AI Agents

Firecrawl is an open-source API designed to empower AI agents and applications with clean, structured web data. It provides robust capabilities for searching, scraping, and interacting with the web at scale, effectively transforming complex web content into LLM-ready formats. This tool handles the intricate challenges of web data extraction, allowing developers to focus on building intelligent applications.

Analyzed May 13, 2026
View Details
Trafilatura: Advanced Web Scraping and Text Extraction in Python

Trafilatura: Advanced Web Scraping and Text Extraction in Python

Trafilatura is a robust Python package and command-line tool designed for gathering text and metadata from the web. It simplifies web crawling, scraping, and content extraction, transforming raw HTML into structured data. Widely adopted by major companies and institutions, it offers high efficiency and accuracy for various text processing needs.

Analyzed May 1, 2026
View Details
Attachments: The Python Funnel for LLM Context and Multimodal Data

Attachments: The Python Funnel for LLM Context and Multimodal Data

Attachments simplifies providing context to Large Language Models by transforming various file types into model-ready text and images. This Python library acts as a universal funnel, enabling developers to integrate diverse data sources like PDFs, images, web content, and even entire code repositories with just a few lines of code. It supports popular LLM APIs and frameworks, making multimodal AI development more accessible.

Analyzed Nov 24, 2025
View Details
python-readability: Extract Clean Main Content from HTML Documents

python-readability: Extract Clean Main Content from HTML Documents

python-readability is a fast Python port of arc90's Readability tool, designed to extract and clean the main body text and title from any given HTML document. It provides an efficient way to process web content, making it easier to focus on essential information. This library is regularly updated to match the latest readability.js functionalities, ensuring modern compatibility and performance.

Analyzed Nov 7, 2025
View Details
sitefetch: Efficiently Scrape Websites for AI Model Training and Analysis

sitefetch: Efficiently Scrape Websites for AI Model Training and Analysis

sitefetch is a powerful command-line utility designed to fetch and save entire websites as plain text files. This tool is particularly useful for preparing large datasets for AI model training, allowing easy consumption of web content. It offers flexible options for page matching and content selection, ensuring relevant data extraction.

Analyzed Oct 12, 2025
View Details
Scrapling: An Undetectable, Powerful, and Adaptive Python Web Scraping Library

Scrapling: An Undetectable, Powerful, and Adaptive Python Web Scraping Library

Scrapling is a high-performance Python library designed for effortless web scraping. It stands out with its adaptive capabilities, automatically adjusting to website changes, and advanced stealth features to bypass anti-bot systems. This makes it a robust solution for modern web data extraction needs.

Analyzed Oct 11, 2025
View Details
Previous Page 1 Next
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️