Repository History
Explore all analyzed open source repositories

Awesome-crawler: A Curated List of Web Crawlers and Spiders
Awesome-crawler is an extensive GitHub repository that curates a collection of web crawling and scraping tools across various programming languages. This resource is invaluable for developers looking for efficient solutions to extract data from the web. It provides a comprehensive overview of popular frameworks and libraries, making it easier to choose the right tool for any web scraping project.
Scraperr: A Powerful Self-Hosted Web Scraping Solution
Scraperr is a powerful self-hosted web scraping solution that allows users to extract data from websites without writing a single line of code. It features XPath-based extraction, queue management, domain spidering, and various data export options. This tool provides a comprehensive platform for efficient and controlled web data collection.

pdfplumber: Extracting Data from PDFs with Ease and Precision
pdfplumber is a powerful Python library designed to extract detailed information from PDFs, including characters, rectangles, and lines. It excels at easily extracting text and tables, making it an invaluable tool for data analysis and automation. Built on pdfminer.six, it provides robust PDF parsing capabilities.
brightdata-mcp: Empowering AI with Real-time Web Access and Data Scraping
The brightdata-mcp is a powerful Model Context Protocol (MCP) server developed by Bright Data, designed to give AI agents real-time web access. It provides an all-in-one solution for seamless public web interaction, ensuring Large Language Models (LLMs) can access live information without encountering blocks or CAPTCHAs. This open-source project offers robust web scraping, browser automation, and data extraction capabilities.

AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data
AnyCrawl is a powerful Node.js/TypeScript web crawler designed to transform websites into LLM-ready data. It excels at extracting structured SERP results from various search engines and features native multi-threading for efficient bulk processing, making it ideal for large-scale data collection.