Repository History
Explore all analyzed open source repositories

DeepScrape: Intelligent Web Scraping & LLM-Powered Data Extraction
DeepScrape is an AI-powered web scraping tool designed for intelligent data extraction using LLMs. It leverages Playwright for browser automation and supports both cloud (OpenAI) and local LLMs (Ollama, vLLM) for transforming web content into structured JSON. This versatile tool is ideal for modern web applications, RAG pipelines, and various data workflows, offering privacy-first data processing.

feedparser: A Robust Python Library for Parsing Feeds
feedparser is a widely-used and reliable Python library designed for parsing Atom and RSS feeds. It simplifies the process of extracting data from various feed formats, making it an essential tool for developers working with syndicated content. With extensive testing and clear documentation, feedparser offers a straightforward solution for feed consumption in Python applications.
Newspaper3k: Advanced News and Article Extraction in Python
Newspaper3k is a powerful Python 3 library designed for news, full-text, and article metadata extraction. Inspired by the simplicity of 'requests' and the speed of 'lxml', it provides robust tools for scraping and curating articles from various sources. This library is ideal for developers needing to programmatically gather and process news content with advanced NLP capabilities.

Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction
Pipet is a versatile command-line web scraper designed for hackers, enabling efficient data extraction from various online assets. It supports HTML parsing, JSON parsing, and client-side JavaScript evaluation, leveraging existing tools like `curl` and `playwright` for powerful and flexible scraping operations. This tool is ideal for tracking information, monitoring changes, and automating data collection tasks.