# RouteLLM: Optimize LLM Costs and Maintain Quality with Intelligent Routing

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/lm-sys-routellm
Generated for open source discovery and AI-assisted research.

RouteLLM is a powerful framework designed to serve and evaluate LLM routers, enabling significant cost savings without compromising response quality. It intelligently routes simpler queries to cheaper models while maintaining high performance, offering a drop-in replacement for existing OpenAI clients or a compatible server. This solution helps balance the dilemma of LLM deployment costs versus model capabilities.

GitHub: https://github.com/lm-sys/RouteLLM
OSRepos URL: https://osrepos.com/repo/lm-sys-routellm

## Summary

RouteLLM is a powerful framework designed to serve and evaluate LLM routers, enabling significant cost savings without compromising response quality. It intelligently routes simpler queries to cheaper models while maintaining high performance, offering a drop-in replacement for existing OpenAI clients or a compatible server. This solution helps balance the dilemma of LLM deployment costs versus model capabilities.

## Topics

- Python
- LLM Routing
- AI
- Cost Optimization
- Machine Learning
- Large Language Models
- API Proxy

## Repository Information

Last analyzed by OSRepos: Sun Jul 05 2026 13:58:02 GMT+0100 (Western European Summer Time)
Detail views: 2
GitHub clicks: 1

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

RouteLLM is an innovative framework designed for serving and evaluating Large Language Model (LLM) routers. It addresses the common dilemma faced when deploying LLMs: balancing the high costs of powerful models like GPT-4 with the potentially lower quality of cheaper alternatives. RouteLLM intelligently routes simpler queries to smaller, more cost-effective models, significantly reducing operational expenses while maintaining high-quality responses.

This framework has demonstrated impressive results, capable of reducing LLM costs by up to 85% while preserving 95% of GPT-4's performance on widely-used benchmarks. It also achieves comparable performance to commercial offerings at a substantially lower cost, making it a powerful tool for optimizing LLM deployments.

For more details, you can refer to the [official blog post](http://lmsys.org/blog/2024-07-01-routellm/) and the [research paper](https://arxiv.org/abs/2406.18665).

## Installation

Getting started with RouteLLM is straightforward. You can install it via PyPI or directly from the source.

**From PyPI**

bash
pip install "routellm[serve,eval]"


**From source**

bash
git clone https://github.com/lm-sys/RouteLLM.git
cd RouteLLM
pip install -e .[serve,eval]


## Examples

RouteLLM offers flexible ways to integrate LLM routing into your applications, either by replacing an existing OpenAI client or by launching an OpenAI-compatible server.

### Python Client Replacement

Here's a quick walkthrough on how to replace your existing OpenAI client to route queries between LLMs using RouteLLM.

1.  **Initialize the Controller**: Replace your OpenAI client by initializing the RouteLLM controller with a router, for example, the `mf` router.
    python
    import os
    from routellm.controller import Controller

    os.environ["OPENAI_API_KEY"] = "sk-XXXXXX"
    # Replace with your model provider, we use Anyscale's Mixtral here.
    os.environ["ANYSCALE_API_KEY"] = "esecret_XXXXXX"

    client = Controller(
      routers=["mf"],
      strong_model="gpt-4-1106-preview",
      weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
    )
    
    You can customize the strong and weak models, as well as their providers.

2.  **Calibrate the Cost Threshold**: Each routing request uses a cost threshold to control the tradeoff between cost and quality. Calibrate this threshold based on your specific query types. For instance, to calibrate for 50% GPT-4 calls using Chatbot Arena data:
    bash
    python -m routellm.calibrate_threshold --routers mf --strong-model-pct 0.5 --config config.example.yaml
    
    This command will output the recommended threshold value.

3.  **Make a Routed Request**: Update the `model` field in your completion requests to specify the router and the calibrated threshold.
    python
    response = client.chat.completions.create(
      # This tells RouteLLM to use the MF router with a cost threshold of 0.11593
      model="router-mf-0.11593",
      messages=[
        {"role": "user", "content": "Hello!"}
      ]
    )
    
    This setup ensures requests are routed dynamically, saving costs while maintaining high response quality.

### Server & Demo

Alternatively, you can launch an OpenAI-compatible server that works with any existing OpenAI client.

1.  **Launch the Server**:
    bash
    export OPENAI_API_KEY=sk-XXXXXX
    export ANYSCALE_API_KEY=esecret_XXXXXX
    python -m routellm.openai_server --routers mf --strong-model gpt-4-1106-preview --weak-model anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1
    
    The server will start on `http://0.0.0.0:6060`.

2.  **Start a Local Router Chatbot Demo**:
    bash
    python -m examples.router_chat --router mf --threshold 0.11593
    
    This allows you to interact with the router and observe how different messages are handled.

## Why Use RouteLLM

RouteLLM provides compelling advantages for anyone deploying LLMs in production:

*   **Significant Cost Savings**: Achieve up to 85% cost reduction without sacrificing quality, by intelligently routing queries to the most appropriate model.
*   **High Performance**: Maintain 95% of GPT-4's performance on key benchmarks, ensuring your applications deliver top-tier results.
*   **OpenAI Client Compatibility**: Seamlessly integrate RouteLLM into existing applications as a drop-in replacement for OpenAI's client or by using its OpenAI-compatible server.
*   **Extensive Model Support**: Leverage [LiteLLM](https://github.com/BerriAI/litellm) to support a wide range of open-source and closed models from various providers, including local models via Ollama.
*   **Pre-trained Routers**: Benefit from out-of-the-box trained routers, with the `mf` router being highly recommended for its strength and lightweight nature. These routers generalize well to different model pairs.
*   **Customizable Routing Strategies**: Easily extend the framework to include new routers and compare their performance across multiple benchmarks.
*   **Threshold Calibration**: Fine-tune the cost-quality tradeoff by calibrating routing thresholds based on your specific dataset and desired strong model call percentage.
*   **Comprehensive Evaluation Framework**: Evaluate different routing strategies on benchmarks like MMLU, GSM8K, and MT-Bench to ensure optimal performance.

## Links

*   **GitHub Repository**: [lm-sys/RouteLLM](https://github.com/lm-sys/RouteLLM)
*   **Official Blog Post**: [RouteLLM Blog](http://lmsys.org/blog/2024-07-01-routellm/)
*   **Research Paper**: [RouteLLM Paper](https://arxiv.org/abs/2406.18665)