Repository History

Explore all analyzed open source repositories

Topic: data extraction
DeepScrape: Intelligent Web Scraping & LLM-Powered Data Extraction

DeepScrape: Intelligent Web Scraping & LLM-Powered Data Extraction

DeepScrape is an AI-powered web scraping tool designed for intelligent data extraction using LLMs. It leverages Playwright for browser automation and supports both cloud (OpenAI) and local LLMs (Ollama, vLLM) for transforming web content into structured JSON. This versatile tool is ideal for modern web applications, RAG pipelines, and various data workflows, offering privacy-first data processing.

Dec 19, 2025
View Details
feedparser: A Robust Python Library for Parsing Feeds

feedparser: A Robust Python Library for Parsing Feeds

feedparser is a widely-used and reliable Python library designed for parsing Atom and RSS feeds. It simplifies the process of extracting data from various feed formats, making it an essential tool for developers working with syndicated content. With extensive testing and clear documentation, feedparser offers a straightforward solution for feed consumption in Python applications.

Nov 10, 2025
View Details
Newspaper3k: Advanced News and Article Extraction in Python

Newspaper3k: Advanced News and Article Extraction in Python

Newspaper3k is a powerful Python 3 library designed for news, full-text, and article metadata extraction. Inspired by the simplicity of 'requests' and the speed of 'lxml', it provides robust tools for scraping and curating articles from various sources. This library is ideal for developers needing to programmatically gather and process news content with advanced NLP capabilities.

Oct 13, 2025
View Details
Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction

Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction

Pipet is a versatile command-line web scraper designed for hackers, enabling efficient data extraction from various online assets. It supports HTML parsing, JSON parsing, and client-side JavaScript evaluation, leveraging existing tools like `curl` and `playwright` for powerful and flexible scraping operations. This tool is ideal for tracking information, monitoring changes, and automating data collection tasks.

Oct 12, 2025
View Details
Page 1