Headroom: Drastically Reduce LLM Token Usage for AI Agents

Summary

Headroom is an innovative context compression layer for AI agents, designed to significantly reduce token usage for LLMs. It achieves 60-95% fewer tokens across various inputs like tool outputs, logs, files, and RAG chunks, all while preserving answer accuracy. This powerful tool enhances efficiency and cost-effectiveness for AI interactions.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Headroom is a powerful and innovative context compression layer designed specifically for AI agents and Large Language Models (LLMs). It tackles the critical challenge of high token usage and context window limitations by compressing various inputs, including tool outputs, logs, RAG chunks, files, and conversation history, before they reach the LLM. Users can expect a remarkable 60-95% reduction in tokens, all while maintaining the accuracy of the LLM's responses.

Operating locally to ensure data privacy, Headroom offers flexible integration options as a Python/TypeScript library, a zero-code proxy, or an MCP server. Its core architecture intelligently routes content to specialized compressors for JSON, code (AST-aware), and general text, and features a reversible compression mechanism (CCR) that allows LLMs to retrieve original content on demand.

Installation

Getting started with Headroom is straightforward. It requires Python 3.10+ for the full feature set.

For Python:

pip install "headroom-ai[all]"

For Node.js / TypeScript:

npm install headroom-ai

You can also use Docker:

docker pull ghcr.io/chopratejas/headroom:latest

Examples

Headroom provides multiple ways to integrate into your existing AI workflows.

Wrap an AI agent:

headroom wrap claude

Run as a local proxy (zero code changes):

headroom proxy --port 8787

Use as an inline library in Python:

from headroom import compress

# Example: Compress a list of messages
messages = [
    {"role": "user", "content": "Analyze this log file: ..."},
    {"role": "assistant", "content": "Processing the log..."},
]
compressed_messages = compress(messages)
print(f"Original tokens: {len(str(messages))}, Compressed tokens: {len(str(compressed_messages))}")

Monitor your savings:

headroom perf
headroom dashboard # Requires proxy to be running

Why Use Headroom?

Headroom offers compelling advantages for anyone working with AI agents and LLMs:

Drastic Token Reduction: Achieve 60-95% fewer tokens, leading to significant cost savings and the ability to process much larger contexts.
Accuracy Preservation: Rigorous benchmarks demonstrate that Headroom maintains or even improves accuracy on standard tasks like math, factual QA, and tool usage.
Flexible Integration: Seamlessly integrate as a library, a proxy, or an agent wrapper, adapting to your preferred development style and existing infrastructure.
Local-First & Reversible: All compression happens locally, keeping your data private. The Content-Cache-Retrieve (CCR) mechanism ensures original content can be retrieved by the LLM if needed.
Cross-Agent Memory: Share compressed context across different AI agents like Claude, Codex, and Gemini, enhancing collaborative workflows.
Output Token Reduction: Beyond input compression, Headroom can also intelligently trim what the model writes back, further optimizing costs.
Broad Compatibility: Works with popular agents and frameworks, including Claude Code, Cursor, LangChain, and any OpenAI-compatible client via its proxy.

Headroom: Drastically Reduce LLM Token Usage for AI Agents

Summary

Repository Information

Topics

Use at your own risk

Introduction

Installation

Examples

Why Use Headroom?

Links

Related repositories

Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation

Dexter: An Autonomous Agent for Deep Financial Research

PixelRAG: Pixel-Native Search for Visual Retrieval-Augmented Generation

GLM-5: Flagship Models for Long-Horizon Agentic Engineering

Source repository