vLLM CLI: A Powerful Command-Line Interface for Serving LLMs with vLLM

vLLM CLI: A Powerful Command-Line Interface for Serving LLMs with vLLM

Summary

vLLM CLI is an intuitive command-line interface tool designed to simplify serving Large Language Models using vLLM. It offers both interactive and direct CLI modes, enabling efficient model management, real-time server monitoring, and advanced configuration. This tool streamlines the deployment and management of LLMs, making it accessible for various use cases.

Repository Info

Updated on January 20, 2026
View on GitHub

Tags

Click on any tag to explore related repositories

Introduction

vLLM CLI is a powerful and intuitive command-line interface tool designed to simplify the process of serving Large Language Models (LLMs) using the vLLM library. It provides a comprehensive suite of features for managing, configuring, and monitoring your LLM inference servers, catering to both interactive use and automated scripting.

Key features include a rich interactive terminal mode, direct CLI commands for automation, automatic discovery of local models with HuggingFace and Ollama support, and flexible configuration profiles. Recent updates have introduced an experimental Multi-Model Proxy server for unified API access to multiple LLMs, hardware-optimized profiles for GPT-OSS models on NVIDIA GPUs, and a convenient shortcuts system for quick launches.

Installation

Important: vLLM Installation Notes
vLLM contains pre-compiled CUDA kernels that must precisely match your PyTorch version. Installing mismatched versions will lead to errors. vLLM CLI does not install vLLM or PyTorch by default.

Option 1: Install vLLM separately and then install vLLM CLI (Recommended)

# Install vLLM -- Skip this step if you have vLLM installed in your environment
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm --torch-backend=auto
# Or specify a backend: uv pip install vllm --torch-backend=cu128

# Install vLLM CLI
uv pip install --upgrade vllm-cli
uv run vllm-cli

# If you are using conda:
# Activate the environment you have vLLM installed in
pip install vllm-cli
vllm-cli

Option 2: Install vLLM CLI + vLLM

pip install vllm-cli[vllm]
vllm-cli

Prerequisites:

  • Python 3.9+
  • CUDA-compatible GPU (recommended)
  • vLLM package installed

Examples

Interactive Mode
Launch the menu-driven interface for easy navigation and management.

vllm-cli

Serve a Model
Quickly serve a specific model using a direct command.

vllm-cli serve --model openai/gpt-oss-20b

Use a Shortcut
Launch pre-configured model and profile combinations with a simple shortcut.

vllm-cli serve --shortcut my-model

Why Use vLLM CLI?

vLLM CLI streamlines the often complex process of deploying and managing LLMs with vLLM. It offers a user-friendly interactive terminal for easy configuration and monitoring, alongside powerful command-line options for automation. With features like automatic model discovery, real-time server monitoring, and optimized configuration profiles, it empowers users to efficiently serve various LLMs, including those from HuggingFace and Ollama, while ensuring optimal performance and resource utilization.

Links