Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction

Summary

Pipet is a versatile command-line web scraper designed for hackers, enabling efficient data extraction from various online assets. It supports HTML parsing, JSON parsing, and client-side JavaScript evaluation, leveraging existing tools like `curl` and `playwright` for powerful and flexible scraping operations. This tool is ideal for tracking information, monitoring changes, and automating data collection tasks.

Repository Information

Analyzed by OSRepos on October 12, 2025

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Pipet is a powerful and flexible command-line web scraper, often described as a "swiss-army tool" for extracting data from online assets. Built with hackers in mind, it simplifies complex scraping tasks by supporting three primary modes of operation: HTML parsing, JSON parsing, and client-side JavaScript evaluation. Pipet cleverly integrates with existing tools like curl and playwright, and utilizes Unix pipes to extend its built-in capabilities, making it highly adaptable for various data extraction needs. Whether you need to track shipments, monitor stock prices, or get notified about concert tickets, Pipet provides a robust solution.

Installation

Getting started with Pipet is straightforward, with several installation options available:

Pre-built Binaries

The easiest way to install is by downloading the latest release for your operating system from the official Releases page. After downloading, make the binary executable with chmod +x pipet and run ./pipet.

Compile from Source

If you have Go installed on your system, you can compile and install Pipet directly:

go install github.com/bjesus/pipet/cmd/pipet@latest

Alternatively, you can run it without a full installation using go run.

Package Managers

Pipet is also available through various package managers:

Examples

Pipet's strength lies in its intuitive .pipet files, which define how and where to extract data. Here's a quick example to scrape Hacker News:

  1. Create a file named hackernews.pipet with the following content:
    curl https://news.ycombinator.com/
    .title .titleline
      span > a
      .sitebit a
    
  2. Run Pipet:
    go run github.com/bjesus/pipet/cmd/pipet@latest hackernews.pipet
    # Or, if installed:
    pipet hackernews.pipet
    

    This will display the latest Hacker News titles and their associated domains directly in your terminal.

Pipet offers many advanced features, including:

  • Custom Separators: Use the --separator flag to format output.
  • JSON Output: Get results as a clean JSON structure with the --json flag.
  • Templating: Render results into custom HTML or text templates.
  • Unix Pipes Integration: Extend functionality by piping data to other command-line tools like wc or htmlq.
  • Monitoring: Set intervals and commands to run on changes, allowing you to track dynamic content.

Why Use Pipet?

Pipet stands out for its versatility and hacker-friendly design. Its ability to handle HTML, JSON, and JavaScript-rendered content means it can tackle almost any web scraping challenge. By integrating with curl for complex HTTP requests and playwright for headless browser automation, it provides powerful capabilities without reinventing the wheel. The use of Unix pipes allows for seamless integration into existing workflows and custom data processing. Furthermore, its monitoring features make it an excellent tool for staying updated on online information, from personal alerts to business intelligence.

Links

Related repositories

Similar repositories that may be relevant next.

no-mistakes: AI-Driven Git Proxy for Flawless Pull Requests

no-mistakes: AI-Driven Git Proxy for Flawless Pull Requests

June 30, 2026

no-mistakes is an innovative Git proxy that streamlines the pull request workflow by ensuring code quality before it reaches your remote. It uses an AI-driven validation pipeline in a disposable worktree, automatically applying safe fixes and escalating complex issues for human review. This tool helps developers maintain clean, high-quality codebases and open perfect PRs effortlessly.

GitAIDeveloper Tools
Gogcli: Google Workspace Management from Your Terminal

Gogcli: Google Workspace Management from Your Terminal

June 24, 2026

Gogcli is a powerful command-line interface for Google Workspace, allowing users to manage Gmail, Calendar, Drive, Docs, Sheets, and many other services directly from their terminal. It is designed for both interactive use and robust automation, providing predictable output, agent safety features, and support for multiple accounts.

gcalgcontactsgdrive
PinchTab: High-Performance Browser Automation for AI Agents

PinchTab: High-Performance Browser Automation for AI Agents

June 21, 2026

PinchTab is a high-performance browser automation bridge and multi-instance orchestrator, designed to give AI agents direct control over Chrome. Built in Go, it offers advanced stealth injection, real-time dashboards, and token-efficient web interaction. It supports both headless and headed modes, enabling robust and secure automation workflows for various applications.

browser-automationGoheadless-chrome
Multigres: Vitess Adaptation for Scalable Postgres Databases

Multigres: Vitess Adaptation for Scalable Postgres Databases

June 3, 2026

Multigres is an innovative project that adapts Vitess for use with PostgreSQL, aiming to bring advanced sharding and scalability features to Postgres environments. Currently in early development, it offers a promising solution for managing large-scale Postgres deployments. Users can explore its capabilities and contribute to its growth.

GoPostgresVitess

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️