Repository History

Explore all analyzed open source repositories

Topic: web-scraping
Cheerio: Fast and Flexible HTML/XML Parsing and Manipulation Library

Cheerio: Fast and Flexible HTML/XML Parsing and Manipulation Library

Cheerio is a popular library for parsing and manipulating HTML and XML documents in Node.js. It provides a jQuery-like API, making it easy to select, traverse, and modify elements with proven syntax. Known for its blazingly fast performance and incredible flexibility, Cheerio is an excellent choice for web scraping and server-side DOM manipulation.

May 13, 2026
View Details
Scraperr: A Powerful Self-Hosted Web Scraping Solution

Scraperr: A Powerful Self-Hosted Web Scraping Solution

Scraperr is a powerful self-hosted web scraping solution that allows users to extract data from websites without writing a single line of code. It features XPath-based extraction, queue management, domain spidering, and various data export options. This tool provides a comprehensive platform for efficient and controlled web data collection.

Feb 16, 2026
View Details
brightdata-mcp: Empowering AI with Real-time Web Access and Data Scraping

brightdata-mcp: Empowering AI with Real-time Web Access and Data Scraping

The brightdata-mcp is a powerful Model Context Protocol (MCP) server developed by Bright Data, designed to give AI agents real-time web access. It provides an all-in-one solution for seamless public web interaction, ensuring Large Language Models (LLMs) can access live information without encountering blocks or CAPTCHAs. This open-source project offers robust web scraping, browser automation, and data extraction capabilities.

Jan 22, 2026
View Details
13ft: Self-Hosted Paywall Bypass and Ad Blocker

13ft: Self-Hosted Paywall Bypass and Ad Blocker

13ft is a powerful, self-hosted Python application designed to bypass paywalls and block ads on various websites, including those that services like 12ft.io might miss. It operates by impersonating GoogleBot to access the full content of articles. This open-source tool offers a flexible solution for users seeking to read restricted content.

Dec 10, 2025
View Details
Browserable: Open Source Browser Automation for AI Agents

Browserable: Open Source Browser Automation for AI Agents

Browserable is an open-source and self-hostable library designed to empower AI agents with advanced browser automation capabilities. It enables agents to navigate websites, fill out forms, click buttons, and extract information efficiently. With a strong performance on Web Voyager benchmarks, Browserable provides a robust foundation for building intelligent AI-driven web interactions.

Oct 12, 2025
View Details
AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data

AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data

AnyCrawl is a powerful Node.js/TypeScript web crawler designed to transform websites into LLM-ready data. It excels at extracting structured SERP results from various search engines and features native multi-threading for efficient bulk processing, making it ideal for large-scale data collection.

Oct 12, 2025
View Details
Page 1