LLaMA-Factory: Unified Efficient Fine-Tuning for 100+ LLMs & VLMs

LLaMA-Factory: Unified Efficient Fine-Tuning for 100+ LLMs & VLMs

Summary

LLaMA-Factory is an open-source project offering a unified and efficient framework for fine-tuning over 100 large language models (LLMs) and vision-language models (VLMs). Recognized at ACL 2024, it provides a comprehensive suite of tools and algorithms for various training approaches. This repository simplifies the complex process of adapting powerful models for specific tasks with ease and scalability.

Repository Info

Updated on November 8, 2025
View on GitHub

Tags

Click on any tag to explore related repositories

Introduction

LLaMA-Factory, developed by hiyouga, is a highly popular and robust framework designed for the unified and efficient fine-tuning of a vast array of large language models (LLMs) and vision-language models (VLMs). With over 62,000 stars and 7,500 forks on GitHub, it stands out as a go-to solution for researchers and developers in the AI community. The project, written primarily in Python and licensed under Apache-2.0, was recognized at ACL 2024 for its significant contributions to the field of efficient model adaptation.

Installation

Getting started with LLaMA-Factory is straightforward. You can install it directly from the source or use a pre-built Docker image.

To install from source, clone the repository and install the necessary dependencies:

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation

For users preferring Docker, a pre-built image is available, simplifying environment setup:

docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest

Examples

LLaMA-Factory provides intuitive command-line interface (CLI) commands for common tasks such as fine-tuning, inference, and model merging. Here are quickstart examples for the Llama3-8B-Instruct model:

To perform LoRA fine-tuning:

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

To run inference with the fine-tuned model:

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

To merge the LoRA adapters back into the base model:

llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

Additionally, LLaMA-Factory offers a user-friendly Web UI for fine-tuning models in your browser:

llamafactory-cli webui

Why Use LLaMA-Factory

LLaMA-Factory is a powerful tool for anyone working with large language models, offering a wide range of features and benefits:

  • Extensive Model Support: It supports over 100 models, including popular ones like LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, DeepSeek, Yi, and Gemma, ensuring compatibility with the latest advancements.
  • Diverse Training Approaches: The framework integrates various methods such as supervised fine-tuning (SFT), reward modeling, PPO, DPO, KTO, and ORPO, catering to different training paradigms.
  • Scalable and Efficient Tuning: It supports 16-bit full-tuning, freeze-tuning, LoRA, and 2/3/4/5/6/8-bit QLoRA via multiple quantization techniques, allowing for efficient training even on limited hardware.
  • Advanced Algorithms and Tricks: LLaMA-Factory incorporates cutting-edge algorithms like GaLore, BAdam, APOLLO, DoRA, LongLoRA, and PiSSA, alongside practical tricks such as FlashAttention-2, Unsloth, and RoPE scaling for enhanced performance.
  • Comprehensive Experiment Monitoring: It integrates with popular experiment monitors like LlamaBoard, TensorBoard, Wandb, and SwanLab, providing robust tracking and visualization capabilities.
  • Faster Inference: The platform offers faster inference through an OpenAI-style API, Gradio UI, and CLI, leveraging backends like vLLM and SGLang for high-throughput deployments.

Links

Explore LLaMA-Factory further through these official resources: