Repository History

3 repositories tagged with web scraping

Topic: web scraping

DeepScrape: Intelligent Web Scraping & LLM-Powered Data Extraction

DeepScrape is an AI-powered web scraping tool designed for intelligent data extraction using LLMs. It leverages Playwright for browser automation and supports both cloud (OpenAI) and local LLMs (Ollama, vLLM) for transforming web content into structured JSON. This versatile tool is ideal for modern web applications, RAG pipelines, and various data workflows, offering privacy-first data processing.

Analyzed Dec 19, 2025

View Details

Newspaper3k: Advanced News and Article Extraction in Python

Newspaper3k is a powerful Python 3 library designed for news, full-text, and article metadata extraction. Inspired by the simplicity of 'requests' and the speed of 'lxml', it provides robust tools for scraping and curating articles from various sources. This library is ideal for developers needing to programmatically gather and process news content with advanced NLP capabilities.

Analyzed Oct 13, 2025

View Details

Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction

Pipet is a versatile command-line web scraper designed for hackers, enabling efficient data extraction from various online assets. It supports HTML parsing, JSON parsing, and client-side JavaScript evaluation, leveraging existing tools like `curl` and `playwright` for powerful and flexible scraping operations. This tool is ideal for tracking information, monitoring changes, and automating data collection tasks.

Analyzed Oct 12, 2025

View Details

Previous Page 1 Next