FreeLLMAPI: Stack 16 Free LLM Tiers for 1.7 Billion Tokens/Month
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
FreeLLMAPI is an OpenAI-compatible proxy that aggregates the free tiers of 16 LLM providers, offering access to approximately 1.7 billion tokens per month. It simplifies access to diverse models through a single endpoint, featuring smart routing, automatic failover, and encrypted key storage. This powerful tool is designed for personal experimentation, allowing developers to leverage multiple free LLM resources efficiently.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
FreeLLMAPI is an innovative, self-hosted, OpenAI-compatible proxy designed to consolidate the free tiers of 16 different Large Language Model (LLM) providers. By stacking these free resources, it offers access to an impressive approximately 1.7 billion tokens per month through a single /v1 endpoint. This project aims to simplify the use of various LLMs for personal experimentation, providing features like smart routing, automatic failover, and secure, encrypted key storage.
Why Use FreeLLMAPI?
Individually, the free tiers offered by major AI labs often feel limited. However, managing multiple SDKs, navigating different rate limits, and handling potential request failures across many providers can be a significant challenge. FreeLLMAPI solves this by collapsing all these complexities into one unified, OpenAI-compatible endpoint. This allows you to point any OpenAI client library at your local FreeLLMAPI server, which then transparently routes requests across the providers for which you've added keys. The aggregated capacity transforms individual "toy" tiers into a substantial working inference capacity.
Key Features
FreeLLMAPI comes packed with features to enhance your LLM experimentation:
- OpenAI-compatible API: Works seamlessly with official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, etc.) by simply changing the
base_url. - Anthropic Messages API Support: Integrates with Anthropic's wire format, allowing Claude clients and SDKs to run against your free LLM pool.
- Image Generation & Text-to-Speech: Routes requests for media models across supported providers.
- Streaming and Non-Streaming: Supports both Server-Sent Events for streaming and standard JSON responses.
- Tool Calling: Passes through OpenAI-style
toolsandtool_choicerequests, supporting multi-step tool-use flows. - Embeddings: Provides a
/v1/embeddingsendpoint with family-based routing, ensuring failover only occurs between compatible models. - Automatic Fallover: Automatically retries requests on the next available model in your fallback chain if a provider returns a 429, 5xx, or times out.
- Per-Key Rate Tracking: Monitors RPM, RPD, TPM, and TPD counters for each
(platform, model, key)to stay within free-tier caps. - Encrypted Key Storage: API keys are encrypted with AES-256-GCM for enhanced security.
- Unified API Key: Clients authenticate to your proxy with a single
freellmapi-...bearer token, keeping upstream provider keys private. - Admin Dashboard: A React + Vite UI for managing keys, reordering the fallback chain, inspecting analytics, and using a prompt playground.
- Context Handoff: Optionally injects a compact system message when switching models mid-conversation to improve continuity.
Installation
Getting FreeLLMAPI up and running is straightforward.
One-liner (Docker Required)
For a quick setup, use the provided install script:
curl -fsSL https://freellmapi.co/install.sh | bash
This script sets up ~/freellmapi, generates an encryption key, pulls the Docker image, and starts the container.
Manual Docker Compose
If you prefer a manual Docker Compose setup:
- Clone the repository:
git clone https://github.com/tashfeenahmed/freellmapi.git cd freellmapi - Generate an encryption key and create a
.envfile:ENCRYPTION_KEY="$(openssl rand -hex 32)" printf "ENCRYPTION_KEY=%s\\nPORT=3001\\n" "$ENCRYPTION_KEY" > .env - Start the services:
docker compose up -dAccess the dashboard at
http://localhost:3001. Remember to add your provider keys and configure the fallback chain.
Desktop App
A native menu-bar application is available for macOS and Windows, providing a local router and dashboard directly from your system tray. You can download the latest .dmg or .exe installer from the GitHub Releases page.
Examples
Once FreeLLMAPI is running, you can use any OpenAI-compatible client.
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3001/v1",
api_key="freellmapi-your-unified-key",
)
resp = client.chat.completions.create(
model="auto", # let the router pick; or specify e.g. "gemini-2.5-flash"
messages=[{"role": "user", "content": "Summarise the fall of Rome in one sentence."}],
)
print(resp.choices[0].message.content)
print("Routed via:", resp.headers.get("x-routed-via"))
curl
curl http://localhost:3001/v1/chat/completions \
-H "Authorization: Bearer freellmapi-your-unified-key" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "hi"}]
}'
Streaming
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Stream me a haiku about SQLite."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Tool Calling
FreeLLMAPI supports OpenAI-style tool calling, allowing complex interactions with your LLMs.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
# 1. Model asks for a tool call
first = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
tools=tools,
tool_choice="required",
)
call = first.choices[0].message.tool_calls[0]
# 2. You execute the tool, feed the result back
final = client.chat.completions.create(
model="auto",
messages=[
{"role": "user", "content": "What's the weather in Karachi?"},
first.choices[0].message,
{"role": "tool", "tool_call_id": call.id, "content": '{"temp_c": 32, "cond": "sunny"}'},
],
tools=tools,
)
print(final.choices[0].message.content)
Vision / Image Input
Send images using standard OpenAI image_url content blocks. The router automatically restricts requests to vision-capable models.
resp = client.chat.completions.create(
model="auto", # auto-routes to a vision model
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,<...>"},},
],
}],
)
print(resp.choices[0].message.content)
Links
- GitHub Repository: tashfeenahmed/freellmapi
- Official Website: freellmapi.co
- Desktop App Releases: GitHub Releases
Related repositories
Similar repositories that may be relevant next.

Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation
June 25, 2026
Voicebox is an innovative open-source AI voice studio that allows users to clone voices, generate speech in multiple languages, and dictate into any application. It provides a comprehensive, local-first voice I/O stack, offering a powerful alternative to cloud-based solutions. This tool ensures complete privacy and control over your voice data, running entirely on your local machine.

EasyWhisperUI: A Cross-Platform Desktop App for Whisper Model Transcription
June 22, 2026
EasyWhisperUI is a fast, local desktop application designed for transcribing audio and video using the Whisper model. It offers GPU acceleration across Windows, macOS, and Linux, providing a user-friendly interface for various transcription tasks. The application supports features like live transcription, batch processing, and translation, making it a versatile tool for media processing.

Dexter: An Autonomous Agent for Deep Financial Research
June 22, 2026
Dexter is an autonomous financial research agent designed to think, plan, and learn while performing analysis. It leverages task planning, self-reflection, and real-time market data to tackle complex financial questions. This project provides a powerful tool for in-depth financial exploration, emphasizing its educational and informational purposes.

Piping Server: Infinite Data Transfer Over Pure HTTP
June 20, 2026
Piping Server is an innovative open-source project enabling infinite data transfer between any device over pure HTTP. It acts as a simple, storageless server, facilitating data streaming with just `curl` or a web browser. This makes it ideal for secure, real-time communication and large file transfers without requiring any installation.
Source repository
Open the original repository on GitHub.