LLaMA-Factory: Unified Efficient Fine-Tuning for 100+ LLMs & VLMs

Summary
LLaMA-Factory is an open-source project offering a unified and efficient framework for fine-tuning over 100 large language models (LLMs) and vision-language models (VLMs). Recognized at ACL 2024, it provides a comprehensive suite of tools and algorithms for various training approaches. This repository simplifies the complex process of adapting powerful models for specific tasks with ease and scalability.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
LLaMA-Factory, developed by hiyouga, is a highly popular and robust framework designed for the unified and efficient fine-tuning of a vast array of large language models (LLMs) and vision-language models (VLMs). With over 62,000 stars and 7,500 forks on GitHub, it stands out as a go-to solution for researchers and developers in the AI community. The project, written primarily in Python and licensed under Apache-2.0, was recognized at ACL 2024 for its significant contributions to the field of efficient model adaptation.
Installation
Getting started with LLaMA-Factory is straightforward. You can install it directly from the source or use a pre-built Docker image.
To install from source, clone the repository and install the necessary dependencies:
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation
For users preferring Docker, a pre-built image is available, simplifying environment setup:
docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest
Examples
LLaMA-Factory provides intuitive command-line interface (CLI) commands for common tasks such as fine-tuning, inference, and model merging. Here are quickstart examples for the Llama3-8B-Instruct model:
To perform LoRA fine-tuning:
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
To run inference with the fine-tuned model:
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
To merge the LoRA adapters back into the base model:
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
Additionally, LLaMA-Factory offers a user-friendly Web UI for fine-tuning models in your browser:
llamafactory-cli webui
Why Use LLaMA-Factory
LLaMA-Factory is a powerful tool for anyone working with large language models, offering a wide range of features and benefits:
- Extensive Model Support: It supports over 100 models, including popular ones like LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, DeepSeek, Yi, and Gemma, ensuring compatibility with the latest advancements.
- Diverse Training Approaches: The framework integrates various methods such as supervised fine-tuning (SFT), reward modeling, PPO, DPO, KTO, and ORPO, catering to different training paradigms.
- Scalable and Efficient Tuning: It supports 16-bit full-tuning, freeze-tuning, LoRA, and 2/3/4/5/6/8-bit QLoRA via multiple quantization techniques, allowing for efficient training even on limited hardware.
- Advanced Algorithms and Tricks: LLaMA-Factory incorporates cutting-edge algorithms like GaLore, BAdam, APOLLO, DoRA, LongLoRA, and PiSSA, alongside practical tricks such as FlashAttention-2, Unsloth, and RoPE scaling for enhanced performance.
- Comprehensive Experiment Monitoring: It integrates with popular experiment monitors like LlamaBoard, TensorBoard, Wandb, and SwanLab, providing robust tracking and visualization capabilities.
- Faster Inference: The platform offers faster inference through an OpenAI-style API, Gradio UI, and CLI, leveraging backends like vLLM and SGLang for high-throughput deployments.
Links
Explore LLaMA-Factory further through these official resources:
- GitHub Repository: hiyouga/LLaMA-Factory
- Official Documentation: LLaMA-Factory Docs
- Colab Notebook: Open in Colab
- Discord Community: Join Discord
- Twitter: Follow @llamafactory_ai