{"name":"Scrapling: An Undetectable, Powerful, and Adaptive Python Web Scraping Library","description":"Scrapling is a high-performance Python library designed for effortless web scraping. It stands out with its adaptive capabilities, automatically adjusting to website changes, and advanced stealth features to bypass anti-bot systems. This makes it a robust solution for modern web data extraction needs.","github":"https://github.com/D4Vinci/Scrapling","url":"https://osrepos.com/repo/d4vinci-scrapling","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/d4vinci-scrapling","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/d4vinci-scrapling.md","json":"https://osrepos.com/repo/d4vinci-scrapling.json","topics":["Python","Web Scraping","Data Extraction","Automation","AI","Crawler","Stealth","Playwright"],"keywords":["Python","Web Scraping","Data Extraction","Automation","AI","Crawler","Stealth","Playwright"],"stars":null,"summary":"Scrapling is a high-performance Python library designed for effortless web scraping. It stands out with its adaptive capabilities, automatically adjusting to website changes, and advanced stealth features to bypass anti-bot systems. This makes it a robust solution for modern web data extraction needs.","content":"## Introduction\n\nScrapling is an advanced, high-performance Python library designed to make web scraping easy and effortless. It stands out by offering undetectable, powerful, and flexible capabilities, making it a robust solution for modern web data extraction challenges. Unlike traditional scraping tools, Scrapling is an *adaptive* library that learns from website changes, automatically relocating elements and keeping your scrapers running even after structural updates. Built by web scrapers for web scrapers, it provides a comprehensive suite of tools for both beginners and experienced developers.\n\n## Why Use Scrapling?\n\nScrapling offers a unique combination of features that address common web scraping pain points:\n\n### Adaptive Scraping & AI Integration\n*   **Smart Element Tracking**: Automatically relocates elements after website changes using intelligent similarity algorithms, reducing maintenance.\n*   **Smart Flexible Selection**: Supports CSS selectors, XPath, filter-based search, text search, and regex, providing versatile data extraction.\n*   **Find Similar Elements**: Easily locate elements similar to those already found.\n*   **MCP Server for AI**: Features a built-in MCP (Multi-Content Processor) server for AI-assisted web scraping, optimizing data extraction and minimizing token usage with AI models like Claude or Cursor.\n\n### Advanced Website Fetching with Session Support\n*   **HTTP Requests**: Perform fast and stealthy HTTP requests with `Fetcher`, impersonating browser TLS fingerprints, headers, and supporting HTTP3.\n*   **Dynamic Loading**: Handle dynamic websites with full browser automation using `DynamicFetcher`, supporting Playwright's Chromium, real Chrome, and custom stealth modes.\n*   **Anti-bot Bypass**: `StealthyFetcher` provides advanced stealth capabilities, including modified Firefox and fingerprint spoofing, to bypass Cloudflare's Turnstile and Interstitial challenges.\n*   **Session Management**: Maintain state and cookies across requests with `FetcherSession`, `StealthySession`, and `DynamicSession`.\n*   **Async Support**: Full asynchronous support across all fetchers and session classes for high-concurrency scraping.\n\n### High-Performance & Battle-tested Architecture\n*   **Lightning Fast**: Optimized for superior performance, often outperforming many other Python scraping libraries.\n*   **Memory Efficient**: Utilizes optimized data structures and lazy loading to ensure a minimal memory footprint.\n*   **Fast JSON Serialization**: Offers significantly faster JSON serialization compared to the standard library.\n*   **Battle-tested**: With 92% test coverage and full type hints, Scrapling has been rigorously tested and used daily by hundreds of web scrapers.\n\n### Developer-Friendly Experience\n*   **Interactive Web Scraping Shell**: An optional built-in IPython shell with Scrapling integration, shortcuts, and tools to accelerate script development.\n*   **CLI Usage**: Scrape URLs directly from the terminal without writing any Python code.\n*   **Rich Navigation API**: Advanced DOM traversal methods for parent, sibling, and child navigation.\n*   **Enhanced Text Processing**: Built-in regex, cleaning methods, and optimized string operations.\n*   **Auto Selector Generation**: Generate robust CSS/XPath selectors for any element.\n*   **Familiar API**: An API similar to Scrapy/BeautifulSoup, using the same pseudo-elements found in Scrapy/Parsel.\n*   **Complete Type Coverage**: Full type hints for excellent IDE support and code completion.\n*   **Ready Docker Image**: A Docker image containing all browsers is automatically built and pushed with each release.\n\n## Installation\n\nScrapling requires Python 3.10 or higher.\n\nTo install the core parser engine:\n\nbash\npip install scrapling\n\n\nFor fetchers and command-line tools, install optional dependencies:\n\nbash\npip install \"scrapling[fetchers]\"\nscrapling install # Downloads browser dependencies\n\n\nOther optional features:\n*   **AI (MCP server)**: `pip install \"scrapling[ai]\"`\n*   **Shell features**: `pip install \"scrapling[shell]\"`\n*   **All features**: `pip install \"scrapling[all]\"`\n\nRemember to run `scrapling install` after installing any extras if you haven't already.\n\nAlternatively, use the Docker image with all extras and browsers:\n\nbash\ndocker pull pyd4vinci/scrapling\n\n\n## Examples\n\nHere are some examples demonstrating Scrapling's capabilities:\n\n### Basic Usage with Fetchers and Sessions\n\npython\nfrom scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher\nfrom scrapling.fetchers import FetcherSession, StealthySession, DynamicSession\n\n# HTTP requests with session support\nwith FetcherSession(impersonate='chrome') as session: # Use latest version of Chrome's TLS fingerprint\n    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)\n    quotes = page.css('.quote .text::text')\n    print(f\"Quotes from FetcherSession: {quotes}\")\n\n# Advanced stealth mode (Keep the browser open until you finish)\nwith StealthySession(headless=True, solve_cloudflare=True) as session:\n    page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)\n    data = page.css('#padded_content a')\n    print(f\"Data from StealthySession: {data}\")\n    \n# Full browser automation (Keep the browser open until you finish)\nwith DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:\n    page = session.fetch('https://quotes.toscrape.com/', load_dom=False)\n    data = page.xpath('//span[@class=\\\"text\\\"]/text()') # XPath selector if you prefer it\n    print(f\"Data from DynamicSession: {data}\")\n\n\n### Advanced Parsing & Navigation\n\npython\nfrom scrapling.fetchers import Fetcher\n\npage = Fetcher.get('https://quotes.toscrape.com/')\n\n# Get quotes with multiple selection methods\nquotes_css = page.css('.quote') # CSS selector\nquotes_xpath = page.xpath('//div[@class=\\\"quote\\\"]') # XPath\nquotes_find_all = page.find_all('div', class_='quote') # BeautifulSoup-style\n\nprint(f\"First quote text (CSS): {quotes_css.css_first('.text::text')}\")\n\n# Advanced navigation\nfirst_quote = page.css_first('.quote')\nauthor = first_quote.next_sibling.css('.author::text')\nprint(f\"Author of first quote: {author}\")\n\n# Element relationships and similarity\nsimilar_elements = first_quote.find_similar()\nprint(f\"Found {len(similar_elements)} similar elements to the first quote.\")\n\n\n### CLI Usage\n\nScrapling also provides a powerful command-line interface:\n\nbash\n# Launch interactive Web Scraping shell\nscrapling shell\n\n# Extract content to a file\nscrapling extract get 'https://example.com' content.md --css-selector '#fromSkipToProducts' --impersonate 'chrome'\n\n\n## Links\n\n*   **GitHub Repository**: <a href=\"https://github.com/D4Vinci/Scrapling\" target=\"_blank\">D4Vinci/Scrapling</a>\n*   **Official Documentation**: <a href=\"https://scrapling.readthedocs.io/en/latest/\" target=\"_blank\">Scrapling ReadTheDocs</a>\n*   **PyPI Project Page**: <a href=\"https://pypi.org/project/scrapling/\" target=\"_blank\">Scrapling on PyPI</a>","metrics":{"detailViews":4,"githubClicks":1},"dates":{"published":null,"modified":"2025-10-11T21:48:11.000Z"}}