AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data

This repository profile is provided by osrepos.com, an open source repository discovery platform.

AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data

Summary

AnyCrawl is a powerful Node.js/TypeScript web crawler designed to transform websites into LLM-ready data. It excels at extracting structured SERP results from various search engines and features native multi-threading for efficient bulk processing, making it ideal for large-scale data collection.

Repository Information

Analyzed by OSRepos on October 12, 2025

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

AnyCrawl is a high-performance, Node.js/TypeScript web crawler and scraping toolkit designed to efficiently gather data from the web. It specializes in transforming raw website content into structured, LLM-ready data, making it an invaluable tool for AI development and data analysis. AnyCrawl supports various operations, including comprehensive site crawling, single-page web scraping, and structured SERP (Search Engine Results Page) data extraction from major search engines like Google. Its native multi-threading capabilities ensure fast and scalable processing for bulk tasks.

Installation

Getting started with AnyCrawl is straightforward, especially using Docker Compose for self-hosting. This method simplifies deployment and setup.

To run AnyCrawl via Docker Compose:

docker compose up -d

If you enable authentication, you'll need to generate an API key. You can do this by executing a command within the running Docker container:

docker compose exec api pnpm --filter api key:generate -- default

For more detailed installation instructions and configuration options, refer to the official documentation.

Examples

AnyCrawl offers flexible APIs for different scraping needs. Here are a couple of examples demonstrating its power:

Web Scraping with LLM Extraction

AnyCrawl can not only scrape web pages but also extract structured data using LLM-powered capabilities, based on a provided JSON schema.

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer YOUR_ANYCRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      }
    }
  }'

Search Engine Results (SERP)

Extract structured search results from engines like Google with ease.

curl -X POST https://api.anycrawl.dev/v1/search \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "query": "AnyCrawl",
  "limit": 10,
  "engine": "google",
  "lang": "all"
}'

You can test these APIs and generate code in your preferred language using the AnyCrawl Playground.

Why Use AnyCrawl?

AnyCrawl stands out for several reasons:

  • LLM-Ready Data: It transforms raw HTML into clean, structured data optimized for Large Language Models, simplifying your AI workflows.
  • High Performance: Leveraging native multi-threading and multi-process capabilities, AnyCrawl handles bulk tasks efficiently and reliably.
  • Versatile Scraping: From full-site traversal to single-page content extraction and structured SERP results, it covers a wide range of web data needs.
  • Ease of Integration: Built with Node.js and TypeScript, it's easy to integrate into existing projects and offers a clear API.
  • Scalability: Designed for batch processing, it can scale to meet demanding data collection requirements.

Links

Source repository

Open the original repository on GitHub.

8 counted GitHub visits

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️