# Pipet: A Swiss-Army Tool for Web Scraping and Data Extraction

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/bjesus-pipet
Generated for open source discovery and AI-assisted research.

Pipet is a versatile command-line web scraper designed for hackers, enabling efficient data extraction from various online assets. It supports HTML parsing, JSON parsing, and client-side JavaScript evaluation, leveraging existing tools like `curl` and `playwright` for powerful and flexible scraping operations. This tool is ideal for tracking information, monitoring changes, and automating data collection tasks.

GitHub: https://github.com/bjesus/pipet
OSRepos URL: https://osrepos.com/repo/bjesus-pipet

## Summary

Pipet is a versatile command-line web scraper designed for hackers, enabling efficient data extraction from various online assets. It supports HTML parsing, JSON parsing, and client-side JavaScript evaluation, leveraging existing tools like `curl` and `playwright` for powerful and flexible scraping operations. This tool is ideal for tracking information, monitoring changes, and automating data collection tasks.

## Topics

- Go
- web scraping
- data extraction
- scraper
- scraping
- curl
- playwright
- json

## Repository Information

Last analyzed by OSRepos: Sun Oct 12 2025 05:50:41 GMT+0100 (Western European Summer Time)
Detail views: 4
GitHub clicks: 4

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

Pipet is a powerful and flexible command-line web scraper, often described as a "swiss-army tool" for extracting data from online assets. Built with hackers in mind, it simplifies complex scraping tasks by supporting three primary modes of operation: HTML parsing, JSON parsing, and client-side JavaScript evaluation. Pipet cleverly integrates with existing tools like `curl` and `playwright`, and utilizes Unix pipes to extend its built-in capabilities, making it highly adaptable for various data extraction needs. Whether you need to track shipments, monitor stock prices, or get notified about concert tickets, Pipet provides a robust solution.

## Installation

Getting started with Pipet is straightforward, with several installation options available:

### Pre-built Binaries
The easiest way to install is by downloading the latest release for your operating system from the official [Releases page](https://github.com/bjesus/pipet/releases/ "Pipet Releases" target="_blank"). After downloading, make the binary executable with `chmod +x pipet` and run `./pipet`.

### Compile from Source
If you have Go installed on your system, you can compile and install Pipet directly:
bash
go install github.com/bjesus/pipet/cmd/pipet@latest

Alternatively, you can run it without a full installation using `go run`.

### Package Managers
Pipet is also available through various package managers:
*   **Arch Linux**: [pipet-git](https://aur.archlinux.org/packages/pipet-git "Pipet on Arch Linux AUR" target="_blank")
*   **Homebrew**: [pipet](https://formulae.brew.sh/formula/pipet "Pipet on Homebrew" target="_blank")
*   **Nix**: [pipet](https://search.nixos.org/packages?channel=unstable&show=pipet&from=0&size=50&sort=relevance&type=packages&query=pipet "Pipet on Nix" target="_blank")

## Examples

Pipet's strength lies in its intuitive `.pipet` files, which define how and where to extract data. Here's a quick example to scrape Hacker News:

1.  Create a file named `hackernews.pipet` with the following content:
    
    curl https://news.ycombinator.com/
    .title .titleline
      span > a
      .sitebit a
    
2.  Run Pipet:
    bash
go run github.com/bjesus/pipet/cmd/pipet@latest hackernews.pipet
# Or, if installed:
pipet hackernews.pipet
    
    This will display the latest Hacker News titles and their associated domains directly in your terminal.

Pipet offers many advanced features, including:
*   **Custom Separators**: Use the `--separator` flag to format output.
*   **JSON Output**: Get results as a clean JSON structure with the `--json` flag.
*   **Templating**: Render results into custom HTML or text templates.
*   **Unix Pipes Integration**: Extend functionality by piping data to other command-line tools like `wc` or `htmlq`.
*   **Monitoring**: Set intervals and commands to run on changes, allowing you to track dynamic content.

## Why Use Pipet?

Pipet stands out for its versatility and hacker-friendly design. Its ability to handle HTML, JSON, and JavaScript-rendered content means it can tackle almost any web scraping challenge. By integrating with `curl` for complex HTTP requests and `playwright` for headless browser automation, it provides powerful capabilities without reinventing the wheel. The use of Unix pipes allows for seamless integration into existing workflows and custom data processing. Furthermore, its monitoring features make it an excellent tool for staying updated on online information, from personal alerts to business intelligence.

## Links

*   **GitHub Repository**: [https://github.com/bjesus/pipet](https://github.com/bjesus/pipet "Pipet GitHub Repository" target="_blank")
*   **Go Reference**: [https://pkg.go.dev/github.com/bjesus/pipet](https://pkg.go.dev/github.com/bjesus/pipet "Pipet Go Reference" target="_blank")
*   **Releases Page**: [https://github.com/bjesus/pipet/releases/](https://github.com/bjesus/pipet/releases/ "Pipet Releases" target="_blank")