FreeLLMAPI: Stack 16 Free LLM Tiers for 1.7 Billion Tokens/Month

Introduction

FreeLLMAPI is an innovative, self-hosted, OpenAI-compatible proxy designed to consolidate the free tiers of 16 different Large Language Model (LLM) providers. By stacking these free resources, it offers access to an impressive approximately 1.7 billion tokens per month through a single /v1 endpoint. This project aims to simplify the use of various LLMs for personal experimentation, providing features like smart routing, automatic failover, and secure, encrypted key storage.

Why Use FreeLLMAPI?

Individually, the free tiers offered by major AI labs often feel limited. However, managing multiple SDKs, navigating different rate limits, and handling potential request failures across many providers can be a significant challenge. FreeLLMAPI solves this by collapsing all these complexities into one unified, OpenAI-compatible endpoint. This allows you to point any OpenAI client library at your local FreeLLMAPI server, which then transparently routes requests across the providers for which you've added keys. The aggregated capacity transforms individual "toy" tiers into a substantial working inference capacity.

Key Features

FreeLLMAPI comes packed with features to enhance your LLM experimentation:

OpenAI-compatible API: Works seamlessly with official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, etc.) by simply changing the base_url.
Anthropic Messages API Support: Integrates with Anthropic's wire format, allowing Claude clients and SDKs to run against your free LLM pool.
Image Generation & Text-to-Speech: Routes requests for media models across supported providers.
Streaming and Non-Streaming: Supports both Server-Sent Events for streaming and standard JSON responses.
Tool Calling: Passes through OpenAI-style tools and tool_choice requests, supporting multi-step tool-use flows.
Embeddings: Provides a /v1/embeddings endpoint with family-based routing, ensuring failover only occurs between compatible models.
Automatic Fallover: Automatically retries requests on the next available model in your fallback chain if a provider returns a 429, 5xx, or times out.
Per-Key Rate Tracking: Monitors RPM, RPD, TPM, and TPD counters for each (platform, model, key) to stay within free-tier caps.
Encrypted Key Storage: API keys are encrypted with AES-256-GCM for enhanced security.
Unified API Key: Clients authenticate to your proxy with a single freellmapi-... bearer token, keeping upstream provider keys private.
Admin Dashboard: A React + Vite UI for managing keys, reordering the fallback chain, inspecting analytics, and using a prompt playground.
Context Handoff: Optionally injects a compact system message when switching models mid-conversation to improve continuity.

Installation

Getting FreeLLMAPI up and running is straightforward.

One-liner (Docker Required)

For a quick setup, use the provided install script:

curl -fsSL https://freellmapi.co/install.sh | bash

This script sets up ~/freellmapi, generates an encryption key, pulls the Docker image, and starts the container.

Manual Docker Compose

If you prefer a manual Docker Compose setup:

Clone the repository:

git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi

Generate an encryption key and create a .env file:

ENCRYPTION_KEY="$(openssl rand -hex 32)"
printf "ENCRYPTION_KEY=%s\\nPORT=3001\\n" "$ENCRYPTION_KEY" > .env

Start the services:
```
docker compose up -d
```
Access the dashboard at http://localhost:3001. Remember to add your provider keys and configure the fallback chain.

Desktop App

A native menu-bar application is available for macOS and Windows, providing a local router and dashboard directly from your system tray. You can download the latest .dmg or .exe installer from the GitHub Releases page.

Examples

Once FreeLLMAPI is running, you can use any OpenAI-compatible client.

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-unified-key",
)

resp = client.chat.completions.create(
    model="auto",  # let the router pick; or specify e.g. "gemini-2.5-flash"
    messages=[{"role": "user", "content": "Summarise the fall of Rome in one sentence."}],
)
print(resp.choices[0].message.content)
print("Routed via:", resp.headers.get("x-routed-via"))

curl

curl http://localhost:3001/v1/chat/completions \
  -H "Authorization: Bearer freellmapi-your-unified-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "hi"}]
  }'

Streaming

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Stream me a haiku about SQLite."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Tool Calling

FreeLLMAPI supports OpenAI-style tool calling, allowing complex interactions with your LLMs.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

# 1. Model asks for a tool call
first = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
    tools=tools,
    tool_choice="required",
)
call = first.choices[0].message.tool_calls[0]

# 2. You execute the tool, feed the result back
final = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "user", "content": "What's the weather in Karachi?"},
        first.choices[0].message,
        {"role": "tool", "tool_call_id": call.id, "content": '{"temp_c": 32, "cond": "sunny"}'},
    ],
    tools=tools,
)
print(final.choices[0].message.content)

Vision / Image Input

Send images using standard OpenAI image_url content blocks. The router automatically restricts requests to vision-capable models.

resp = client.chat.completions.create(
    model="auto",  # auto-routes to a vision model
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "data:image/png;base64,<...>"},},
        ],
    }],
)
print(resp.choices[0].message.content)

FreeLLMAPI: Stack 16 Free LLM Tiers for 1.7 Billion Tokens/Month

Summary

Repository Information

Topics

Use at your own risk

Introduction

Why Use FreeLLMAPI?

Key Features

Installation

One-liner (Docker Required)

Manual Docker Compose

Desktop App

Examples

Python

curl

Streaming

Tool Calling

Vision / Image Input

Links

Related repositories

Voicebox: The Open-Source AI Voice Studio for Cloning and Dictation

EasyWhisperUI: A Cross-Platform Desktop App for Whisper Model Transcription

Dexter: An Autonomous Agent for Deep Financial Research

Piping Server: Infinite Data Transfer Over Pure HTTP

Source repository