{"name":"RouteLLM: Optimize LLM Costs and Maintain Quality with Intelligent Routing","description":"RouteLLM is a powerful framework designed to serve and evaluate LLM routers, enabling significant cost savings without compromising response quality. It intelligently routes simpler queries to cheaper models while maintaining high performance, offering a drop-in replacement for existing OpenAI clients or a compatible server. This solution helps balance the dilemma of LLM deployment costs versus model capabilities.","github":"https://github.com/lm-sys/RouteLLM","url":"https://osrepos.com/repo/lm-sys-routellm","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/lm-sys-routellm","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/lm-sys-routellm.md","json":"https://osrepos.com/repo/lm-sys-routellm.json","topics":["Python","LLM Routing","AI","Cost Optimization","Machine Learning","Large Language Models","API Proxy"],"keywords":["Python","LLM Routing","AI","Cost Optimization","Machine Learning","Large Language Models","API Proxy"],"stars":null,"summary":"RouteLLM is a powerful framework designed to serve and evaluate LLM routers, enabling significant cost savings without compromising response quality. It intelligently routes simpler queries to cheaper models while maintaining high performance, offering a drop-in replacement for existing OpenAI clients or a compatible server. This solution helps balance the dilemma of LLM deployment costs versus model capabilities.","content":"## Introduction\n\nRouteLLM is an innovative framework designed for serving and evaluating Large Language Model (LLM) routers. It addresses the common dilemma faced when deploying LLMs: balancing the high costs of powerful models like GPT-4 with the potentially lower quality of cheaper alternatives. RouteLLM intelligently routes simpler queries to smaller, more cost-effective models, significantly reducing operational expenses while maintaining high-quality responses.\n\nThis framework has demonstrated impressive results, capable of reducing LLM costs by up to 85% while preserving 95% of GPT-4's performance on widely-used benchmarks. It also achieves comparable performance to commercial offerings at a substantially lower cost, making it a powerful tool for optimizing LLM deployments.\n\nFor more details, you can refer to the [official blog post](http://lmsys.org/blog/2024-07-01-routellm/) and the [research paper](https://arxiv.org/abs/2406.18665).\n\n## Installation\n\nGetting started with RouteLLM is straightforward. You can install it via PyPI or directly from the source.\n\n**From PyPI**\n\nbash\npip install \"routellm[serve,eval]\"\n\n\n**From source**\n\nbash\ngit clone https://github.com/lm-sys/RouteLLM.git\ncd RouteLLM\npip install -e .[serve,eval]\n\n\n## Examples\n\nRouteLLM offers flexible ways to integrate LLM routing into your applications, either by replacing an existing OpenAI client or by launching an OpenAI-compatible server.\n\n### Python Client Replacement\n\nHere's a quick walkthrough on how to replace your existing OpenAI client to route queries between LLMs using RouteLLM.\n\n1.  **Initialize the Controller**: Replace your OpenAI client by initializing the RouteLLM controller with a router, for example, the `mf` router.\n    python\n    import os\n    from routellm.controller import Controller\n\n    os.environ[\"OPENAI_API_KEY\"] = \"sk-XXXXXX\"\n    # Replace with your model provider, we use Anyscale's Mixtral here.\n    os.environ[\"ANYSCALE_API_KEY\"] = \"esecret_XXXXXX\"\n\n    client = Controller(\n      routers=[\"mf\"],\n      strong_model=\"gpt-4-1106-preview\",\n      weak_model=\"anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1\",\n    )\n    \n    You can customize the strong and weak models, as well as their providers.\n\n2.  **Calibrate the Cost Threshold**: Each routing request uses a cost threshold to control the tradeoff between cost and quality. Calibrate this threshold based on your specific query types. For instance, to calibrate for 50% GPT-4 calls using Chatbot Arena data:\n    bash\n    python -m routellm.calibrate_threshold --routers mf --strong-model-pct 0.5 --config config.example.yaml\n    \n    This command will output the recommended threshold value.\n\n3.  **Make a Routed Request**: Update the `model` field in your completion requests to specify the router and the calibrated threshold.\n    python\n    response = client.chat.completions.create(\n      # This tells RouteLLM to use the MF router with a cost threshold of 0.11593\n      model=\"router-mf-0.11593\",\n      messages=[\n        {\"role\": \"user\", \"content\": \"Hello!\"}\n      ]\n    )\n    \n    This setup ensures requests are routed dynamically, saving costs while maintaining high response quality.\n\n### Server & Demo\n\nAlternatively, you can launch an OpenAI-compatible server that works with any existing OpenAI client.\n\n1.  **Launch the Server**:\n    bash\n    export OPENAI_API_KEY=sk-XXXXXX\n    export ANYSCALE_API_KEY=esecret_XXXXXX\n    python -m routellm.openai_server --routers mf --strong-model gpt-4-1106-preview --weak-model anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1\n    \n    The server will start on `http://0.0.0.0:6060`.\n\n2.  **Start a Local Router Chatbot Demo**:\n    bash\n    python -m examples.router_chat --router mf --threshold 0.11593\n    \n    This allows you to interact with the router and observe how different messages are handled.\n\n## Why Use RouteLLM\n\nRouteLLM provides compelling advantages for anyone deploying LLMs in production:\n\n*   **Significant Cost Savings**: Achieve up to 85% cost reduction without sacrificing quality, by intelligently routing queries to the most appropriate model.\n*   **High Performance**: Maintain 95% of GPT-4's performance on key benchmarks, ensuring your applications deliver top-tier results.\n*   **OpenAI Client Compatibility**: Seamlessly integrate RouteLLM into existing applications as a drop-in replacement for OpenAI's client or by using its OpenAI-compatible server.\n*   **Extensive Model Support**: Leverage [LiteLLM](https://github.com/BerriAI/litellm) to support a wide range of open-source and closed models from various providers, including local models via Ollama.\n*   **Pre-trained Routers**: Benefit from out-of-the-box trained routers, with the `mf` router being highly recommended for its strength and lightweight nature. These routers generalize well to different model pairs.\n*   **Customizable Routing Strategies**: Easily extend the framework to include new routers and compare their performance across multiple benchmarks.\n*   **Threshold Calibration**: Fine-tune the cost-quality tradeoff by calibrating routing thresholds based on your specific dataset and desired strong model call percentage.\n*   **Comprehensive Evaluation Framework**: Evaluate different routing strategies on benchmarks like MMLU, GSM8K, and MT-Bench to ensure optimal performance.\n\n## Links\n\n*   **GitHub Repository**: [lm-sys/RouteLLM](https://github.com/lm-sys/RouteLLM)\n*   **Official Blog Post**: [RouteLLM Blog](http://lmsys.org/blog/2024-07-01-routellm/)\n*   **Research Paper**: [RouteLLM Paper](https://arxiv.org/abs/2406.18665)","metrics":{"detailViews":2,"githubClicks":1},"dates":{"published":null,"modified":"2026-07-05T12:58:02.000Z"}}