# AnyCrawl: A High-Performance Node.js/TypeScript Web Crawler for LLM Data

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/any4ai-anycrawl
Generated for open source discovery and AI-assisted research.

AnyCrawl is a powerful Node.js/TypeScript web crawler designed to transform websites into LLM-ready data. It excels at extracting structured SERP results from various search engines and features native multi-threading for efficient bulk processing, making it ideal for large-scale data collection.

GitHub: https://github.com/any4ai/AnyCrawl
OSRepos URL: https://osrepos.com/repo/any4ai-anycrawl

## Summary

AnyCrawl is a powerful Node.js/TypeScript web crawler designed to transform websites into LLM-ready data. It excels at extracting structured SERP results from various search engines and features native multi-threading for efficient bulk processing, making it ideal for large-scale data collection.

## Topics

- web-crawling
- data-extraction
- llm-data
- serp-scraping
- typescript
- nodejs
- ai-tools
- web-scraping

## Repository Information

Last analyzed by OSRepos: Sun Oct 12 2025 07:20:38 GMT+0100 (Western European Summer Time)
Detail views: 5
GitHub clicks: 8

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

AnyCrawl is a high-performance, Node.js/TypeScript web crawler and scraping toolkit designed to efficiently gather data from the web. It specializes in transforming raw website content into structured, LLM-ready data, making it an invaluable tool for AI development and data analysis. AnyCrawl supports various operations, including comprehensive site crawling, single-page web scraping, and structured SERP (Search Engine Results Page) data extraction from major search engines like Google. Its native multi-threading capabilities ensure fast and scalable processing for bulk tasks.

## Installation

Getting started with AnyCrawl is straightforward, especially using Docker Compose for self-hosting. This method simplifies deployment and setup.

To run AnyCrawl via Docker Compose:

bash
docker compose up -d


If you enable authentication, you'll need to generate an API key. You can do this by executing a command within the running Docker container:

bash
docker compose exec api pnpm --filter api key:generate -- default


For more detailed installation instructions and configuration options, refer to the [official documentation](https://docs.anycrawl.dev).

## Examples

AnyCrawl offers flexible APIs for different scraping needs. Here are a couple of examples demonstrating its power:

### Web Scraping with LLM Extraction

AnyCrawl can not only scrape web pages but also extract structured data using LLM-powered capabilities, based on a provided JSON schema.

bash
curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer YOUR_ANYCRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      }
    }
  }'


### Search Engine Results (SERP)

Extract structured search results from engines like Google with ease.

bash
curl -X POST https://api.anycrawl.dev/v1/search \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "query": "AnyCrawl",
  "limit": 10,
  "engine": "google",
  "lang": "all"
}'


You can test these APIs and generate code in your preferred language using the [AnyCrawl Playground](https://anycrawl.dev/playground).

## Why Use AnyCrawl?

AnyCrawl stands out for several reasons:

*   **LLM-Ready Data**: It transforms raw HTML into clean, structured data optimized for Large Language Models, simplifying your AI workflows.
*   **High Performance**: Leveraging native multi-threading and multi-process capabilities, AnyCrawl handles bulk tasks efficiently and reliably.
*   **Versatile Scraping**: From full-site traversal to single-page content extraction and structured SERP results, it covers a wide range of web data needs.
*   **Ease of Integration**: Built with Node.js and TypeScript, it's easy to integrate into existing projects and offers a clear API.
*   **Scalability**: Designed for batch processing, it can scale to meet demanding data collection requirements.

## Links

*   **GitHub Repository**: [any4ai/AnyCrawl](https://github.com/any4ai/AnyCrawl)
*   **Official Documentation**: [docs.anycrawl.dev](https://docs.anycrawl.dev)
*   **API Playground**: [anycrawl.dev/playground](https://anycrawl.dev/playground)