Scraperr: A Powerful Self-Hosted Web Scraping Solution
This repository profile is provided by osrepos.com, an open source repository discovery platform.
Summary
Scraperr is a powerful self-hosted web scraping solution that allows users to extract data from websites without writing a single line of code. It features XPath-based extraction, queue management, domain spidering, and various data export options. This tool provides a comprehensive platform for efficient and controlled web data collection.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Scraperr is an open-source, self-hosted web scraper designed to simplify data extraction from websites. It eliminates the need for coding, offering an intuitive interface to define and manage scraping jobs. Built with modern technologies like TypeScript, FastAPI, Next.js, and MongoDB, Scraperr provides a robust and scalable solution for various web scraping needs. Key features include precise XPath-based element targeting, queue management for multiple jobs, domain spidering, custom headers, media downloads, and structured results visualization.
Installation
Getting Scraperr up and running is straightforward, with primary deployment options via Docker and Helm.
Docker
For a quick setup using Docker, navigate to the project directory and run the following command:
make up
This command will orchestrate the necessary services to launch Scraperr.
Helm
For Kubernetes deployments, Scraperr provides Helm charts. Detailed instructions for Helm deployment can be found in the official documentation:
Refer to the docs for Helm deployment
Examples
Scraperr empowers users to scrape websites without writing any code. Once deployed, you can access its web interface to configure scraping tasks. Users can define scraping jobs by specifying URLs and using XPath expressions to precisely target and extract desired data elements. The tool supports advanced features like scraping all pages within the same domain (domain spidering) and automatically downloading images, videos, and other media linked on the pages. After a job completes, the scraped data is presented in a structured table format within the interface, ready for review and export in markdown or CSV formats.
Why Use Scraperr?
Scraperr stands out as an excellent choice for web scraping due to several compelling reasons:
- No-Code Scraping: Extract data efficiently without writing a single line of code, making it accessible to a broader audience.
- Self-Hosted Control: Maintain full control over your scraping infrastructure and data, ensuring privacy and compliance.
- Powerful Features: Benefit from XPath-based extraction, queue management, domain spidering, custom headers, and media downloads.
- Data Visualization & Export: Easily view scraped data in a structured table and export it in convenient formats like markdown and CSV.
- Ethical Guidelines: The project emphasizes responsible scraping practices, encouraging users to respect
robots.txt, terms of service, and rate limiting.
Links
- GitHub Repository: https://github.com/jaypyles/Scraperr
- Official Documentation: https://scraperr-docs.pages.dev
- Join the Community (Discord): https://discord.gg/89q7scsGEK
Related repositories
Similar repositories that may be relevant next.

PinchTab: High-Performance Browser Automation for AI Agents
June 21, 2026
PinchTab is a high-performance browser automation bridge and multi-instance orchestrator, designed to give AI agents direct control over Chrome. Built in Go, it offers advanced stealth injection, real-time dashboards, and token-efficient web interaction. It supports both headless and headed modes, enabling robust and secure automation workflows for various applications.
CloakBrowser: Stealth Chromium for Unblockable Web Scraping and Automation
May 27, 2026
CloakBrowser is a powerful, open-source stealth Chromium browser engineered to bypass advanced bot detection systems. It achieves unparalleled stealth through C++ source-level fingerprint patches, making it appear as a normal browser and passing over 30 detection tests. Designed as a drop-in replacement for Playwright and Puppeteer, it simplifies web automation for AI agents, web scraping, and more.

AI Website Cloner Template: Clone Websites with AI Coding Agents
May 26, 2026
The AI Website Cloner Template is an innovative open-source project that leverages AI coding agents to reverse-engineer any website into a clean, modern Next.js codebase. It enables users to clone entire websites with a single command, extracting design tokens, assets, and reconstructing sections in parallel. This tool is ideal for platform migration, recovering lost source code, or learning web development by deconstructing live sites.

Cheerio: Fast and Flexible HTML/XML Parsing and Manipulation Library
May 13, 2026
Cheerio is a popular library for parsing and manipulating HTML and XML documents in Node.js. It provides a jQuery-like API, making it easy to select, traverse, and modify elements with proven syntax. Known for its blazingly fast performance and incredible flexibility, Cheerio is an excellent choice for web scraping and server-side DOM manipulation.
Source repository
Open the original repository on GitHub.