Headroom: Drastically Reduce LLM Token Usage for AI Agents
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
Headroom is an innovative context compression layer for AI agents, designed to significantly reduce token usage for LLMs. It achieves 60-95% fewer tokens across various inputs like tool outputs, logs, files, and RAG chunks, all while preserving answer accuracy. This powerful tool enhances efficiency and cost-effectiveness for AI interactions.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
Headroom is a powerful and innovative context compression layer designed specifically for AI agents and Large Language Models (LLMs). It tackles the critical challenge of high token usage and context window limitations by compressing various inputs, including tool outputs, logs, RAG chunks, files, and conversation history, before they reach the LLM. Users can expect a remarkable 60-95% reduction in tokens, all while maintaining the accuracy of the LLM's responses.
Operating locally to ensure data privacy, Headroom offers flexible integration options as a Python/TypeScript library, a zero-code proxy, or an MCP server. Its core architecture intelligently routes content to specialized compressors for JSON, code (AST-aware), and general text, and features a reversible compression mechanism (CCR) that allows LLMs to retrieve original content on demand.
Installation
Getting started with Headroom is straightforward. It requires Python 3.10+ for the full feature set.
For Python:
pip install "headroom-ai[all]"
For Node.js / TypeScript:
npm install headroom-ai
You can also use Docker:
docker pull ghcr.io/chopratejas/headroom:latest
Examples
Headroom provides multiple ways to integrate into your existing AI workflows.
Wrap an AI agent:
headroom wrap claude
Run as a local proxy (zero code changes):
headroom proxy --port 8787
Use as an inline library in Python:
from headroom import compress
# Example: Compress a list of messages
messages = [
{"role": "user", "content": "Analyze this log file: ..."},
{"role": "assistant", "content": "Processing the log..."},
]
compressed_messages = compress(messages)
print(f"Original tokens: {len(str(messages))}, Compressed tokens: {len(str(compressed_messages))}")
Monitor your savings:
headroom perf
headroom dashboard # Requires proxy to be running
Why Use Headroom?
Headroom offers compelling advantages for anyone working with AI agents and LLMs:
- Drastic Token Reduction: Achieve 60-95% fewer tokens, leading to significant cost savings and the ability to process much larger contexts.
- Accuracy Preservation: Rigorous benchmarks demonstrate that Headroom maintains or even improves accuracy on standard tasks like math, factual QA, and tool usage.
- Flexible Integration: Seamlessly integrate as a library, a proxy, or an agent wrapper, adapting to your preferred development style and existing infrastructure.
- Local-First & Reversible: All compression happens locally, keeping your data private. The Content-Cache-Retrieve (CCR) mechanism ensures original content can be retrieved by the LLM if needed.
- Cross-Agent Memory: Share compressed context across different AI agents like Claude, Codex, and Gemini, enhancing collaborative workflows.
- Output Token Reduction: Beyond input compression, Headroom can also intelligently trim what the model writes back, further optimizing costs.
- Broad Compatibility: Works with popular agents and frameworks, including Claude Code, Cursor, LangChain, and any OpenAI-compatible client via its proxy.
Links
- GitHub Repository: https://github.com/headroomlabs-ai/headroom
- Official Documentation: https://headroom-docs.vercel.app/docs
- Discord Community: https://discord.gg/yRmaUNpsPJ
- PyPI Package: https://pypi.org/project/headroom-ai/
- npm Package: https://www.npmjs.com/package/headroom-ai
- Kompress-v2-base Model: https://huggingface.co/chopratejas/kompress-v2-base
Related repositories
Similar repositories that may be relevant next.

Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation
June 25, 2026
Voicebox is an innovative open-source AI voice studio that allows users to clone voices, generate speech in multiple languages, and dictate into any application. It provides a comprehensive, local-first voice I/O stack, offering a powerful alternative to cloud-based solutions. This tool ensures complete privacy and control over your voice data, running entirely on your local machine.

Dexter: An Autonomous Agent for Deep Financial Research
June 22, 2026
Dexter is an autonomous financial research agent designed to think, plan, and learn while performing analysis. It leverages task planning, self-reflection, and real-time market data to tackle complex financial questions. This project provides a powerful tool for in-depth financial exploration, emphasizing its educational and informational purposes.
PixelRAG: Pixel-Native Search for Visual Retrieval-Augmented Generation
June 22, 2026
PixelRAG revolutionizes search by enabling pixel-native retrieval, moving beyond traditional text parsing. It renders documents as screenshots, preserving visual context like tables and charts, which is crucial for accurate answers from reader models. This allows for searching any document based on its visual appearance, not just its textual content.

GLM-5: Flagship Models for Long-Horizon Agentic Engineering
June 18, 2026
GLM-5 is a series of flagship models, including GLM-5.2, GLM-5.1, and GLM-5, developed by zai-org for complex systems engineering and long-horizon agentic tasks. These models offer advanced coding capabilities, impressive context lengths, and state-of-the-art performance on various benchmarks. They are designed to sustain effective problem-solving over extended sessions through iterative reasoning and strategy revision.
Source repository
Open the original repository on GitHub.