LLaMA-Factory: Unified Efficient Fine-Tuning for 100+ LLMs & VLMs

Introduction

LLaMA-Factory, developed by hiyouga, is a highly popular and robust framework designed for the unified and efficient fine-tuning of a vast array of large language models (LLMs) and vision-language models (VLMs). With over 62,000 stars and 7,500 forks on GitHub, it stands out as a go-to solution for researchers and developers in the AI community. The project, written primarily in Python and licensed under Apache-2.0, was recognized at ACL 2024 for its significant contributions to the field of efficient model adaptation.

Installation

Getting started with LLaMA-Factory is straightforward. You can install it directly from the source or use a pre-built Docker image.

To install from source, clone the repository and install the necessary dependencies:

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation

For users preferring Docker, a pre-built image is available, simplifying environment setup:

docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest

Examples

LLaMA-Factory provides intuitive command-line interface (CLI) commands for common tasks such as fine-tuning, inference, and model merging. Here are quickstart examples for the Llama3-8B-Instruct model:

To perform LoRA fine-tuning:

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

To run inference with the fine-tuned model:

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

To merge the LoRA adapters back into the base model:

llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

Additionally, LLaMA-Factory offers a user-friendly Web UI for fine-tuning models in your browser:

llamafactory-cli webui

Why Use LLaMA-Factory

LLaMA-Factory is a powerful tool for anyone working with large language models, offering a wide range of features and benefits:

Extensive Model Support: It supports over 100 models, including popular ones like LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, DeepSeek, Yi, and Gemma, ensuring compatibility with the latest advancements.
Diverse Training Approaches: The framework integrates various methods such as supervised fine-tuning (SFT), reward modeling, PPO, DPO, KTO, and ORPO, catering to different training paradigms.
Scalable and Efficient Tuning: It supports 16-bit full-tuning, freeze-tuning, LoRA, and 2/3/4/5/6/8-bit QLoRA via multiple quantization techniques, allowing for efficient training even on limited hardware.
Advanced Algorithms and Tricks: LLaMA-Factory incorporates cutting-edge algorithms like GaLore, BAdam, APOLLO, DoRA, LongLoRA, and PiSSA, alongside practical tricks such as FlashAttention-2, Unsloth, and RoPE scaling for enhanced performance.
Comprehensive Experiment Monitoring: It integrates with popular experiment monitors like LlamaBoard, TensorBoard, Wandb, and SwanLab, providing robust tracking and visualization capabilities.
Faster Inference: The platform offers faster inference through an OpenAI-style API, Gradio UI, and CLI, leveraging backends like vLLM and SGLang for high-throughput deployments.

LLaMA-Factory: Unified Efficient Fine-Tuning for 100+ LLMs & VLMs

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use LLaMA-Factory

Links