maestro: Streamlining Fine-Tuning for Multimodal Models like PaliGemma 2 and Florence-2

Summary
maestro is a powerful tool designed to accelerate the fine-tuning process for multimodal models. It encapsulates best practices, handling configuration, data loading, reproducibility, and training loop setup efficiently. The project currently offers ready-to-use recipes for popular vision-language models, including Florence-2, PaliGemma 2, and Qwen2.5-VL.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
maestro is a streamlined tool developed by Roboflow to accelerate the fine-tuning of multimodal models. By encapsulating best practices, maestro simplifies complex tasks such as configuration, data loading, reproducibility, and training loop setup. It currently provides ready-to-use recipes for popular vision-language models, including Florence-2, PaliGemma 2, and Qwen2.5-VL, making advanced model customization more accessible.
Installation
To get started with maestro, you need to install the model-specific dependencies. It is recommended to create a dedicated Python environment for each model due to potential clashing requirements.
pip install "maestro[paligemma_2]"
Replace paligemma_2 with the specific model you intend to use, for example, florence_2 or qwen2_5_vl.
Examples
maestro offers both a command-line interface (CLI) and a Python API for fine-tuning your models. Additionally, the repository provides convenient Colab notebooks for hands-on experimentation.
Command-Line Interface (CLI)
Kick off fine-tuning directly from your terminal by specifying key parameters like dataset location, epochs, batch size, optimization strategy, and metrics.
maestro paligemma_2 train \
--dataset "dataset/location" \
--epochs 10 \
--batch-size 4 \
--optimization_strategy "qlora" \
--metrics "edit_distance"
Python API
For greater control and integration into existing workflows, use the Python API. Import the train function from the corresponding module and define your configuration in a dictionary.
from maestro.trainer.models.paligemma_2.core import train
config = {
"dataset": "dataset/location",
"epochs": 10,
"batch_size": 4,
"optimization_strategy": "qlora",
"metrics": ["edit_distance"]
}
train(config)
Colab Notebooks
Explore practical examples and fine-tune models directly in Google Colab. The maestro repository includes several cookbooks, such as:
- Florence-2 (0.9B) object detection with LoRA
- PaliGemma 2 (3B) JSON data extraction with LoRA
- Qwen2.5-VL (3B) JSON data extraction with QLoRA
Why Use maestro?
maestro stands out by simplifying the often-complex process of fine-tuning multimodal models. Its key advantages include:
- Streamlined Workflow: Accelerates the entire fine-tuning process, from setup to training.
- Best Practices Encapsulated: Handles configuration, data loading, reproducibility, and training loop setup, allowing users to focus on their data and models.
- Ready-to-Use Recipes: Provides pre-configured setups for popular models like Florence-2, PaliGemma 2, and Qwen2.5-VL.
- Hardware Efficiency: Supports optimization strategies like LoRA, QLoRA, and graph freezing to keep hardware requirements in check.
- Consistent Data Handling: Utilizes a consistent JSONL format to streamline data preparation.
- Unified Interface: Offers a single CLI/SDK to reduce code complexity across different models and tasks.
Links
- GitHub Repository: https://github.com/roboflow/maestro
- Official Colab Notebook: https://colab.research.google.com/github/roboflow/maestro/blob/develop/cookbooks/maestro_qwen2_5_vl_json_extraction.ipynb
- Join the Discord Community: https://discord.gg/GbfgXGJ8Bk
- Contributing Guide: https://github.com/roboflow/maestro/blob/develop/CONTRIBUTING.md
- GitHub Discussions: https://github.com/roboflow/maestro/discussions