Optimum: Accelerate Hugging Face Models with Hardware Optimization

Optimum: Accelerate Hugging Face Models with Hardware Optimization

Summary

Optimum is an extension of Hugging Face Transformers, Diffusers, TIMM, and Sentence-Transformers, designed to provide a suite of optimization tools. It enables maximum efficiency for training and running models on targeted hardware, simplifying the process for developers. This library helps users achieve significant performance gains across various machine learning workflows.

Repository Info

Updated on January 6, 2026
View on GitHub

Tags

Click on any tag to explore related repositories

Introduction

Optimum is a powerful extension of Hugging Face Transformers, Diffusers, TIMM, and Sentence-Transformers, providing a comprehensive set of optimization tools. Its primary goal is to enable maximum efficiency for training and running models on targeted hardware, all while maintaining ease of use for developers. Optimum helps accelerate your machine learning workflows by leveraging various hardware-specific optimizations. For more detailed information, refer to the official Optimum documentation.

Installation

Getting started with Optimum is straightforward. You can install the core library using pip:

python -m pip install optimum

For accelerator-specific features, Optimum provides additional installation options. These specialized installations ensure you have the necessary dependencies for optimizing models on particular hardware. Please consult the official documentation for detailed instructions on installing accelerator-specific components, such as those for ONNX Runtime, OpenVINO, or Habana.

Examples

Optimum offers extensive capabilities for both accelerated inference and training across a variety of hardware and software ecosystems.

Accelerated Inference

Optimum provides multiple tools to export and run optimized models efficiently:

  • ONNX / ONNX Runtime: Export models to the ONNX format and run them with the high-performance ONNX Runtime. Learn more about ONNX export and running ONNX models.
  • Intel OpenVINO: Optimize, quantize, and deploy deep learning models on Intel hardware. Detailed guides are available in the OpenVINO documentation.
  • ExecuTorch: Leverage PyTorch’s native solution for on-device inference across mobile and edge devices. Explore ExecuTorch export.
  • Quanto: A PyTorch quantization backend that allows for model quantization via Python API or optimum-cli. Find more details and examples in the Quanto repository.
  • NVIDIA TensorRT-LLM: Accelerate large language models on NVIDIA GPUs. Refer to the Optimum NVIDIA blog post for more information.

Accelerated Training

Optimum also provides wrappers around the original Transformers Trainer, simplifying accelerated training on powerful hardware:

Why use Optimum?

Optimum is an essential tool for anyone working with Hugging Face models who needs to achieve peak performance. It simplifies the complex process of hardware optimization, allowing developers to focus on model development rather than low-level hardware specifics. By providing broad support for various accelerators and offering easy-to-use APIs, Optimum ensures your models run faster, more efficiently, and with reduced computational costs, both during inference and training.

Links