# Scrapling: An Undetectable, Powerful, and Adaptive Python Web Scraping Library

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/d4vinci-scrapling
Generated for open source discovery and AI-assisted research.

Scrapling is a high-performance Python library designed for effortless web scraping. It stands out with its adaptive capabilities, automatically adjusting to website changes, and advanced stealth features to bypass anti-bot systems. This makes it a robust solution for modern web data extraction needs.

GitHub: https://github.com/D4Vinci/Scrapling
OSRepos URL: https://osrepos.com/repo/d4vinci-scrapling

## Summary

Scrapling is a high-performance Python library designed for effortless web scraping. It stands out with its adaptive capabilities, automatically adjusting to website changes, and advanced stealth features to bypass anti-bot systems. This makes it a robust solution for modern web data extraction needs.

## Topics

- Python
- Web Scraping
- Data Extraction
- Automation
- AI
- Crawler
- Stealth
- Playwright

## Repository Information

Last analyzed by OSRepos: Sat Oct 11 2025 22:48:11 GMT+0100 (Western European Summer Time)
Detail views: 3
GitHub clicks: 1

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

Scrapling is an advanced, high-performance Python library designed to make web scraping easy and effortless. It stands out by offering undetectable, powerful, and flexible capabilities, making it a robust solution for modern web data extraction challenges. Unlike traditional scraping tools, Scrapling is an *adaptive* library that learns from website changes, automatically relocating elements and keeping your scrapers running even after structural updates. Built by web scrapers for web scrapers, it provides a comprehensive suite of tools for both beginners and experienced developers.

## Why Use Scrapling?

Scrapling offers a unique combination of features that address common web scraping pain points:

### Adaptive Scraping & AI Integration
*   **Smart Element Tracking**: Automatically relocates elements after website changes using intelligent similarity algorithms, reducing maintenance.
*   **Smart Flexible Selection**: Supports CSS selectors, XPath, filter-based search, text search, and regex, providing versatile data extraction.
*   **Find Similar Elements**: Easily locate elements similar to those already found.
*   **MCP Server for AI**: Features a built-in MCP (Multi-Content Processor) server for AI-assisted web scraping, optimizing data extraction and minimizing token usage with AI models like Claude or Cursor.

### Advanced Website Fetching with Session Support
*   **HTTP Requests**: Perform fast and stealthy HTTP requests with `Fetcher`, impersonating browser TLS fingerprints, headers, and supporting HTTP3.
*   **Dynamic Loading**: Handle dynamic websites with full browser automation using `DynamicFetcher`, supporting Playwright's Chromium, real Chrome, and custom stealth modes.
*   **Anti-bot Bypass**: `StealthyFetcher` provides advanced stealth capabilities, including modified Firefox and fingerprint spoofing, to bypass Cloudflare's Turnstile and Interstitial challenges.
*   **Session Management**: Maintain state and cookies across requests with `FetcherSession`, `StealthySession`, and `DynamicSession`.
*   **Async Support**: Full asynchronous support across all fetchers and session classes for high-concurrency scraping.

### High-Performance & Battle-tested Architecture
*   **Lightning Fast**: Optimized for superior performance, often outperforming many other Python scraping libraries.
*   **Memory Efficient**: Utilizes optimized data structures and lazy loading to ensure a minimal memory footprint.
*   **Fast JSON Serialization**: Offers significantly faster JSON serialization compared to the standard library.
*   **Battle-tested**: With 92% test coverage and full type hints, Scrapling has been rigorously tested and used daily by hundreds of web scrapers.

### Developer-Friendly Experience
*   **Interactive Web Scraping Shell**: An optional built-in IPython shell with Scrapling integration, shortcuts, and tools to accelerate script development.
*   **CLI Usage**: Scrape URLs directly from the terminal without writing any Python code.
*   **Rich Navigation API**: Advanced DOM traversal methods for parent, sibling, and child navigation.
*   **Enhanced Text Processing**: Built-in regex, cleaning methods, and optimized string operations.
*   **Auto Selector Generation**: Generate robust CSS/XPath selectors for any element.
*   **Familiar API**: An API similar to Scrapy/BeautifulSoup, using the same pseudo-elements found in Scrapy/Parsel.
*   **Complete Type Coverage**: Full type hints for excellent IDE support and code completion.
*   **Ready Docker Image**: A Docker image containing all browsers is automatically built and pushed with each release.

## Installation

Scrapling requires Python 3.10 or higher.

To install the core parser engine:

bash
pip install scrapling


For fetchers and command-line tools, install optional dependencies:

bash
pip install "scrapling[fetchers]"
scrapling install # Downloads browser dependencies


Other optional features:
*   **AI (MCP server)**: `pip install "scrapling[ai]"`
*   **Shell features**: `pip install "scrapling[shell]"`
*   **All features**: `pip install "scrapling[all]"`

Remember to run `scrapling install` after installing any extras if you haven't already.

Alternatively, use the Docker image with all extras and browsers:

bash
docker pull pyd4vinci/scrapling


## Examples

Here are some examples demonstrating Scrapling's capabilities:

### Basic Usage with Fetchers and Sessions

python
from scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher
from scrapling.fetchers import FetcherSession, StealthySession, DynamicSession

# HTTP requests with session support
with FetcherSession(impersonate='chrome') as session: # Use latest version of Chrome's TLS fingerprint
    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)
    quotes = page.css('.quote .text::text')
    print(f"Quotes from FetcherSession: {quotes}")

# Advanced stealth mode (Keep the browser open until you finish)
with StealthySession(headless=True, solve_cloudflare=True) as session:
    page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)
    data = page.css('#padded_content a')
    print(f"Data from StealthySession: {data}")
    
# Full browser automation (Keep the browser open until you finish)
with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:
    page = session.fetch('https://quotes.toscrape.com/', load_dom=False)
    data = page.xpath('//span[@class=\"text\"]/text()') # XPath selector if you prefer it
    print(f"Data from DynamicSession: {data}")


### Advanced Parsing & Navigation

python
from scrapling.fetchers import Fetcher

page = Fetcher.get('https://quotes.toscrape.com/')

# Get quotes with multiple selection methods
quotes_css = page.css('.quote') # CSS selector
quotes_xpath = page.xpath('//div[@class=\"quote\"]') # XPath
quotes_find_all = page.find_all('div', class_='quote') # BeautifulSoup-style

print(f"First quote text (CSS): {quotes_css.css_first('.text::text')}")

# Advanced navigation
first_quote = page.css_first('.quote')
author = first_quote.next_sibling.css('.author::text')
print(f"Author of first quote: {author}")

# Element relationships and similarity
similar_elements = first_quote.find_similar()
print(f"Found {len(similar_elements)} similar elements to the first quote.")


### CLI Usage

Scrapling also provides a powerful command-line interface:

bash
# Launch interactive Web Scraping shell
scrapling shell

# Extract content to a file
scrapling extract get 'https://example.com' content.md --css-selector '#fromSkipToProducts' --impersonate 'chrome'


## Links

*   **GitHub Repository**: <a href="https://github.com/D4Vinci/Scrapling" target="_blank">D4Vinci/Scrapling</a>
*   **Official Documentation**: <a href="https://scrapling.readthedocs.io/en/latest/" target="_blank">Scrapling ReadTheDocs</a>
*   **PyPI Project Page**: <a href="https://pypi.org/project/scrapling/" target="_blank">Scrapling on PyPI</a>