Repository History
80 repositories tagged with Machine Learning

TRELLIS: Structured 3D Latents for Scalable and Versatile 3D Generation
TRELLIS is the official repository for a CVPR'25 Spotlight paper on "Structured 3D Latents for Scalable and Versatile 3D Generation." This Microsoft project introduces a powerful model for generating high-quality 3D assets from text or image prompts. It supports diverse output formats like Radiance Fields, 3D Gaussians, and meshes, offering flexible editing capabilities.

Jieba: The Leading Python Library for Chinese Text Segmentation
Jieba is a highly popular and efficient Python library designed for Chinese text segmentation. It offers various cutting modes, including accurate, full, and search engine modes, making it versatile for different NLP tasks. With features like custom dictionaries and part-of-speech tagging, Jieba provides a comprehensive solution for processing Chinese text.

AudioSep: Foundation Model for Open-Domain Sound Separation with Language Queries
AudioSep is a groundbreaking foundation model for open-domain sound separation, allowing users to isolate specific sounds using natural language descriptions. It demonstrates strong performance and impressive zero-shot generalization across various tasks, including audio event, musical instrument, and speech separation. This powerful tool simplifies complex audio processing with intuitive text-based queries.

JAX: Composable Transformations for Python+NumPy Programs
JAX is a powerful Python library designed for high-performance numerical computing and large-scale machine learning. It offers composable function transformations like automatic differentiation, JIT compilation to accelerators (GPU/TPU), and auto-vectorization. This powerful combination allows developers to write flexible and efficient numerical programs.

index-tts-lora: High-Quality Speech Synthesis with LoRA Fine-tuning
index-tts-lora offers a robust solution for high-quality speech synthesis, leveraging LoRA fine-tuning on the index-tts framework. It significantly enhances prosody and naturalness for both single and multi-speaker voices. This project provides practical methods for training and inference, making advanced voice synthesis more accessible.

Infinity: High-Throughput, Low-Latency Serving for Text Embeddings and Reranking
Infinity is a powerful, high-throughput, and low-latency REST API designed for serving various AI models, including text embeddings, reranking, and multi-modal models. It supports deploying any model from HuggingFace with fast inference backends optimized for diverse accelerators. This engine simplifies the deployment and usage of advanced AI models for developers.

LLMBox: A Comprehensive Python Library for LLM Training and Evaluation
LLMBox is a comprehensive Python library designed for implementing Large Language Models, offering a unified training pipeline and extensive model evaluation capabilities. It provides a one-stop solution for both training and utilizing LLMs, emphasizing flexibility and efficiency. Developers can leverage its diverse training strategies and blazingly fast inference for their LLM projects.

Rio: Build Web and Desktop Apps in Pure Python, No JavaScript Needed
Rio is an innovative Python framework that allows developers to create web and desktop applications using pure Python, eliminating the need for HTML, CSS, or JavaScript. It provides a modern, declarative UI approach with over 50 built-in components, making app development efficient and enjoyable. With Rio, you can build powerful, type-safe applications that run seamlessly across different environments.
Magenta RT: Live Music Generation on Your Local Device
Magenta RealTime (Magenta RT) is an open-source Python library for live music audio generation on local devices. It allows users to create music using both text and audio prompts, serving as a powerful tool for real-time creative audio exploration. This library is the on-device companion to Google's MusicFX DJ Mode and the Lyria RealTime API.

NUDGE: Lightweight Non-Parametric Embedding Fine-Tuning for Retrieval
NUDGE is a lightweight, non-parametric tool designed to fine-tune pre-trained embeddings, significantly enhancing retrieval and RAG pipelines. It operates by adjusting data embeddings directly, rather than modifying model parameters, to maximize accuracy. This approach often leads to over 10% improvement in retrieval accuracy and runs in minutes.

maestro: Streamlining Fine-Tuning for Multimodal Models like PaliGemma 2 and Florence-2
maestro is a powerful tool designed to accelerate the fine-tuning process for multimodal models. It encapsulates best practices, handling configuration, data loading, reproducibility, and training loop setup efficiently. The project currently offers ready-to-use recipes for popular vision-language models, including Florence-2, PaliGemma 2, and Qwen2.5-VL.

KBLaM: Knowledge Base Augmented Language Models for Enhanced LLMs
KBLaM, developed by Microsoft, is the official implementation of "Knowledge Base Augmented Language Models" presented at ICLR 2025. This innovative method enhances Large Language Models by directly integrating external knowledge bases, offering an efficient alternative to traditional Retrieval-Augmented Generation (RAG) and in-context learning. It eliminates external retrieval modules and scales computationally linearly with knowledge base size, rather than quadratically.