Repository History
Explore all analyzed open source repositories

Cheerio: Fast and Flexible HTML/XML Parsing and Manipulation Library
Cheerio is a popular library for parsing and manipulating HTML and XML documents in Node.js. It provides a jQuery-like API, making it easy to select, traverse, and modify elements with proven syntax. Known for its blazingly fast performance and incredible flexibility, Cheerio is an excellent choice for web scraping and server-side DOM manipulation.
Scraperr: A Powerful Self-Hosted Web Scraping Solution
Scraperr is a powerful self-hosted web scraping solution that allows users to extract data from websites without writing a single line of code. It features XPath-based extraction, queue management, domain spidering, and various data export options. This tool provides a comprehensive platform for efficient and controlled web data collection.
brightdata-mcp: Empowering AI with Real-time Web Access and Data Scraping
The brightdata-mcp is a powerful Model Context Protocol (MCP) server developed by Bright Data, designed to give AI agents real-time web access. It provides an all-in-one solution for seamless public web interaction, ensuring Large Language Models (LLMs) can access live information without encountering blocks or CAPTCHAs. This open-source project offers robust web scraping, browser automation, and data extraction capabilities.

13ft: Self-Hosted Paywall Bypass and Ad Blocker
13ft is a powerful, self-hosted Python application designed to bypass paywalls and block ads on various websites, including those that services like 12ft.io might miss. It operates by impersonating GoogleBot to access the full content of articles. This open-source tool offers a flexible solution for users seeking to read restricted content.

Browserable: Open Source Browser Automation for AI Agents
Browserable is an open-source and self-hostable library designed to empower AI agents with advanced browser automation capabilities. It enables agents to navigate websites, fill out forms, click buttons, and extract information efficiently. With a strong performance on Web Voyager benchmarks, Browserable provides a robust foundation for building intelligent AI-driven web interactions.

AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data
AnyCrawl is a powerful Node.js/TypeScript web crawler designed to transform websites into LLM-ready data. It excels at extracting structured SERP results from various search engines and features native multi-threading for efficient bulk processing, making it ideal for large-scale data collection.