Repository History

4 repositories tagged with data extraction

Topic: data extraction

DeepScrape: Intelligent Web Scraping & LLM-Powered Data Extraction

DeepScrape is an AI-powered web scraping tool designed for intelligent data extraction using LLMs. It leverages Playwright for browser automation and supports both cloud (OpenAI) and local LLMs (Ollama, vLLM) for transforming web content into structured JSON. This versatile tool is ideal for modern web applications, RAG pipelines, and various data workflows, offering privacy-first data processing.

Analyzed Dec 19, 2025

View Details

feedparser: A Robust Python Library for Parsing Feeds

feedparser is a widely-used and reliable Python library designed for parsing Atom and RSS feeds. It simplifies the process of extracting data from various feed formats, making it an essential tool for developers working with syndicated content. With extensive testing and clear documentation, feedparser offers a straightforward solution for feed consumption in Python applications.

Analyzed Nov 10, 2025

View Details

Newspaper3k: Advanced News and Article Extraction in Python

Newspaper3k is a powerful Python 3 library designed for news, full-text, and article metadata extraction. Inspired by the simplicity of 'requests' and the speed of 'lxml', it provides robust tools for scraping and curating articles from various sources. This library is ideal for developers needing to programmatically gather and process news content with advanced NLP capabilities.

Analyzed Oct 13, 2025

View Details

Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction

Pipet is a versatile command-line web scraper designed for hackers, enabling efficient data extraction from various online assets. It supports HTML parsing, JSON parsing, and client-side JavaScript evaluation, leveraging existing tools like `curl` and `playwright` for powerful and flexible scraping operations. This tool is ideal for tracking information, monitoring changes, and automating data collection tasks.

Analyzed Oct 12, 2025

View Details

Previous Page 1 Next