Repository History
187 repositories tagged with AI

Optimum: Accelerate Hugging Face Models with Hardware Optimization
Optimum is an extension of Hugging Face Transformers, Diffusers, TIMM, and Sentence-Transformers, designed to provide a suite of optimization tools. It enables maximum efficiency for training and running models on targeted hardware, simplifying the process for developers. This library helps users achieve significant performance gains across various machine learning workflows.

bolt.diy: AI-Powered Full-Stack Web Development with Any LLM in Your Browser
bolt.diy is an open-source project that empowers developers to prompt, run, edit, and deploy full-stack web applications directly in their browser. It offers unparalleled flexibility by supporting over 19 different Large Language Models (LLMs), allowing users to choose their preferred AI for code generation and development tasks. This tool streamlines the development workflow, making AI-assisted coding accessible and highly customizable.

notesGPT: AI-Powered Voice Notes with Transcription and Summarization
notesGPT is an innovative open-source project that allows users to record voice notes and leverage AI to transcribe, summarize, and extract actionable tasks from them. Built with a modern tech stack including Convex, Next.js, and Together.ai, it streamlines the process of turning spoken ideas into organized information. This tool is ideal for anyone looking to enhance their productivity by efficiently managing their voice recordings.

Vexa: Self-Hosted Meeting Intelligence Platform with Real-Time Transcripts
Vexa is an open-source, self-hostable meeting intelligence platform designed for real-time transcription across Google Meet and Microsoft Teams. It provides a multi-user API that deploys bots to meetings, offering robust data sovereignty and flexible deployment options for various enterprise needs. Built with Python, Vexa supports real-time multilingual transcription and translation.

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper2Code is an innovative multi-agent LLM system designed to automate the generation of code repositories directly from scientific papers in machine learning. It employs a sophisticated three-stage pipeline, encompassing planning, analysis, and code generation, each managed by specialized agents. This approach ensures faithful and high-quality implementations, outperforming existing baselines on relevant benchmarks.

big_vision: Google Research's Codebase for Large-Scale Vision Models
big_vision is Google Research's official codebase for training large-scale vision models using Jax/Flax. It has been instrumental in developing prominent architectures like Vision Transformer, SigLIP, and MLP-Mixer. This repository offers a robust starting point for researchers to conduct scalable vision experiments on GPUs and Cloud TPUs, scaling seamlessly from single cores to distributed setups.
NVIDIA Isaac GR00T: A Foundation Model for Generalist Robots
NVIDIA Isaac GR00T N1.6 is an open vision-language-action (VLA) foundation model designed for generalized humanoid robot skills. It enables robots to perform manipulation tasks in diverse environments by taking multimodal input, including language and images. Researchers and professionals can leverage this model for fine-tuning on custom datasets and deploying it for inference.
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation
HunyuanVideo-Avatar is a cutting-edge project by Tencent-Hunyuan for high-fidelity, audio-driven human animation. Utilizing a multimodal diffusion transformer, it generates dynamic, emotion-controllable, and multi-character dialogue videos. This innovative system addresses critical challenges in character consistency, emotion alignment, and multi-character animation, making it suitable for diverse applications like e-commerce and social media.

context-engineering-intro: Master AI Coding Assistants with Context Engineering
Context Engineering represents a powerful evolution beyond traditional prompt engineering, focusing on providing comprehensive information to AI coding assistants for end-to-end task completion. The coleam00/context-engineering-intro repository offers a robust template and step-by-step guide to implement this discipline effectively. It enables developers to leverage AI, particularly with tools like Claude Code, to build complex features with greater consistency and fewer failures.

OmniParser: A Vision-Based Tool for GUI Agent Screen Parsing
OmniParser is a comprehensive tool developed by Microsoft for parsing user interface screenshots into structured, understandable elements. It significantly enhances the ability of vision-based models, such as GPT-4V, to generate accurate actions grounded in specific regions of a GUI. This project aims to advance pure vision-based GUI agents by providing robust screen parsing capabilities.
Memori: SQL Native Memory Layer for LLMs and AI Agents
Memori is an SQL Native Memory Layer designed for LLMs, AI Agents, and Multi-Agent Systems. It provides a robust and flexible solution for managing long-short term memory, integrating seamlessly with existing software and infrastructure. This project aims to enhance AI systems with persistent, structured memory capabilities, making them more intelligent and context-aware.
Clarity-Upscaler: Free and Open-Source AI Image Upscaler & Enhancer
Clarity-Upscaler is an open-source AI image upscaler and enhancer, offering a free alternative to tools like Magnific. Built with Python, this repository provides powerful features for high-resolution image generation and enhancement, supporting various integration methods for developers and users alike.