Repository History
Explore all analyzed open source repositories

maestro: Streamlining Fine-Tuning for Multimodal Models like PaliGemma 2 and Florence-2
maestro is a powerful tool designed to accelerate the fine-tuning process for multimodal models. It encapsulates best practices, handling configuration, data loading, reproducibility, and training loop setup efficiently. The project currently offers ready-to-use recipes for popular vision-language models, including Florence-2, PaliGemma 2, and Qwen2.5-VL.

Rerun: Open Source SDK for Multimodal Data Logging and Visualization
Rerun is an open-source SDK designed for logging, storing, querying, and visualizing complex multimodal and multi-rate data. It provides SDKs for C++, Python, and Rust, enabling developers to stream data to a powerful viewer for live visualization or later analysis. This tool is particularly valuable for debugging and understanding systems in robotics, computer vision, and spatial AI.

SmolVLM Real-time Webcam: Real-time Object Detection with Llama.cpp
The `smolvlm-realtime-webcam` repository provides a simple, yet powerful, demo for real-time object detection using a webcam. It leverages the SmolVLM 500M model and the `llama.cpp` server, offering an accessible way to explore local multimodal AI capabilities. This project allows users to easily set up and interact with a live AI vision system.