Repository History

2 repositories tagged with vLLM

Topic: vLLM

LLM Compressor: Optimize LLMs for Deployment with vLLM

LLM Compressor is a Transformers-compatible Python library designed to apply various compression algorithms to Large Language Models (LLMs). It enables optimized deployment, especially with vLLM, by offering a comprehensive set of quantization techniques for weights, activations, and KV Cache. This tool seamlessly integrates with Hugging Face models, making LLM optimization accessible and efficient.

Analyzed Jul 4, 2026

View Details

chatterbox-vllm: Accelerating Chatterbox TTS with vLLM for Enhanced Performance

chatterbox-vllm is a high-performance port of the Chatterbox Text-to-Speech (TTS) model to vLLM, designed to significantly improve generation speed and GPU memory efficiency. This personal project aims to provide a more efficient and easily integratable solution for speech synthesis, offering substantial speedups compared to the original implementation. While currently usable and demonstrating benchmark-topping throughput, it leverages internal vLLM APIs and hacky workarounds, with ongoing refactoring planned.

Analyzed Oct 11, 2025

View Details

Previous Page 1 Next