Optimum: Accelerate Hugging Face Models with Hardware Optimization

Summary
Optimum is an extension of Hugging Face Transformers, Diffusers, TIMM, and Sentence-Transformers, designed to provide a suite of optimization tools. It enables maximum efficiency for training and running models on targeted hardware, simplifying the process for developers. This library helps users achieve significant performance gains across various machine learning workflows.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Optimum is a powerful extension of Hugging Face Transformers, Diffusers, TIMM, and Sentence-Transformers, providing a comprehensive set of optimization tools. Its primary goal is to enable maximum efficiency for training and running models on targeted hardware, all while maintaining ease of use for developers. Optimum helps accelerate your machine learning workflows by leveraging various hardware-specific optimizations. For more detailed information, refer to the official Optimum documentation.
Installation
Getting started with Optimum is straightforward. You can install the core library using pip:
python -m pip install optimum
For accelerator-specific features, Optimum provides additional installation options. These specialized installations ensure you have the necessary dependencies for optimizing models on particular hardware. Please consult the official documentation for detailed instructions on installing accelerator-specific components, such as those for ONNX Runtime, OpenVINO, or Habana.
Examples
Optimum offers extensive capabilities for both accelerated inference and training across a variety of hardware and software ecosystems.
Accelerated Inference
Optimum provides multiple tools to export and run optimized models efficiently:
- ONNX / ONNX Runtime: Export models to the ONNX format and run them with the high-performance ONNX Runtime. Learn more about ONNX export and running ONNX models.
- Intel OpenVINO: Optimize, quantize, and deploy deep learning models on Intel hardware. Detailed guides are available in the OpenVINO documentation.
- ExecuTorch: Leverage PyTorch’s native solution for on-device inference across mobile and edge devices. Explore ExecuTorch export.
- Quanto: A PyTorch quantization backend that allows for model quantization via Python API or
optimum-cli. Find more details and examples in the Quanto repository. - NVIDIA TensorRT-LLM: Accelerate large language models on NVIDIA GPUs. Refer to the Optimum NVIDIA blog post for more information.
Accelerated Training
Optimum also provides wrappers around the original Transformers Trainer, simplifying accelerated training on powerful hardware:
- Intel Gaudi Accelerators (HPU): Achieve optimal performance for training on first-gen Gaudi, Gaudi2, and Gaudi3. See the Intel Gaudi training documentation.
- AWS Trainium: Enable accelerated training on Trn1 and Trn1n instances. Explore AWS Trainium training tutorials.
Why use Optimum?
Optimum is an essential tool for anyone working with Hugging Face models who needs to achieve peak performance. It simplifies the complex process of hardware optimization, allowing developers to focus on model development rather than low-level hardware specifics. By providing broad support for various accelerators and offering easy-to-use APIs, Optimum ensures your models run faster, more efficiently, and with reduced computational costs, both during inference and training.
Links
- GitHub Repository: huggingface/optimum
- Official Documentation: Optimum Documentation