Repository History

Explore all analyzed open source repositories

Topic: multimodal

maestro: Streamlining Fine-Tuning for Multimodal Models like PaliGemma 2 and Florence-2

maestro is a powerful tool designed to accelerate the fine-tuning process for multimodal models. It encapsulates best practices, handling configuration, data loading, reproducibility, and training loop setup efficiently. The project currently offers ready-to-use recipes for popular vision-language models, including Florence-2, PaliGemma 2, and Qwen2.5-VL.

Mar 2, 2026

View Details

Rerun: Open Source SDK for Multimodal Data Logging and Visualization

Rerun is an open-source SDK designed for logging, storing, querying, and visualizing complex multimodal and multi-rate data. It provides SDKs for C++, Python, and Rust, enabling developers to stream data to a powerful viewer for live visualization or later analysis. This tool is particularly valuable for debugging and understanding systems in robotics, computer vision, and spatial AI.

Jan 2, 2026

View Details

SmolVLM Real-time Webcam: Real-time Object Detection with Llama.cpp

The `smolvlm-realtime-webcam` repository provides a simple, yet powerful, demo for real-time object detection using a webcam. It leverages the SmolVLM 500M model and the `llama.cpp` server, offering an accessible way to explore local multimodal AI capabilities. This project allows users to easily set up and interact with a live AI vision system.

Oct 11, 2025

View Details

Page 1