UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack

This repository profile is provided by osrepos.com, an open source repository discovery platform.

UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack

Summary

UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.

Repository Information

Analyzed by OSRepos on May 6, 2026

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It encompasses two primary projects: Agent TARS, a versatile multimodal AI agent, and UI-TARS Desktop, a native GUI agent. This powerful stack aims to facilitate human-like task completion through rich multimodal capabilities, including GUI agent and vision, and seamless integration with various real-world tools.

Installation

Getting started with Agent TARS CLI is straightforward. Ensure you have Node.js version 22 or higher installed.

You can launch it directly using npx:

npx @agent-tars/cli@latest

Alternatively, install it globally:

npm install @agent-tars/cli@latest -g

Then, run it with your preferred model provider and API key:

agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

For comprehensive setup instructions, refer to the Quick Start guide.

Examples

UI-TARS-desktop demonstrates powerful automation capabilities across various scenarios.

Agent TARS Showcase

  • Flight Booking: Automate complex tasks like booking flights.
    • Instruction: "Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline."
    • Watch video example
  • Hotel Booking & Information Gathering:
    • Instruction: "I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me."
    • Watch video example
  • Chart Generation: Integrate with external tools for data visualization.

For more use cases, explore the GitHub Issues showcase.

UI-TARS Desktop Showcase

Why Use UI-TARS-desktop?

UI-TARS-desktop offers a comprehensive solution for AI-driven automation, providing:

  • One-Click Out-of-the-box CLI: Easy execution with both headful Web UI and headless server options.
  • Hybrid Browser Agent: Flexible browser control using GUI Agent, DOM, or a combined strategy.
  • Event Stream: Protocol-driven event stream for context engineering and agent UI development.
  • MCP Integration: Kernel built on MCP, supporting the mounting of MCP Servers for real-world tool connections.
  • Natural Language Control: Powered by Vision-Language Models for intuitive interaction.
  • Screenshot and Visual Recognition: Enhanced understanding of interfaces through visual capabilities.
  • Precise Control: Accurate mouse and keyboard control for detailed operations.
  • Cross-Platform Support: Compatibility across Windows, MacOS, and browser environments.
  • Real-time Feedback: Immediate status display for ongoing tasks.
  • Private and Secure: Fully local processing ensures data privacy and security.

Links

Related repositories

Similar repositories that may be relevant next.

Feynman: The Open Source AI Research Agent

Feynman: The Open Source AI Research Agent

June 2, 2026

Feynman is an open-source AI research agent designed to automate and streamline complex research tasks. Built with TypeScript, it leverages multiple agents and tools to conduct in-depth investigations, literature reviews, and even experiment replications, providing source-grounded outputs.

TypeScriptAI ResearchAI Agent
CodeGraph: Supercharge AI Coding Agents with Semantic Code Intelligence

CodeGraph: Supercharge AI Coding Agents with Semantic Code Intelligence

May 25, 2026

CodeGraph is a powerful, pre-indexed code knowledge graph designed to enhance AI coding agents like Claude Code, Cursor, and Codex. It significantly reduces token usage and tool calls, offering a faster and more cost-effective way for agents to understand codebases. This 100% local solution provides semantic code intelligence, improving agent efficiency and accuracy.

TypeScriptAI AgentCode Analysis
Agentic Inbox: A Self-Hosted Email Client with AI on Cloudflare Workers

Agentic Inbox: A Self-Hosted Email Client with AI on Cloudflare Workers

May 18, 2026

Agentic Inbox is an innovative self-hosted email client that integrates an AI agent, running entirely on Cloudflare Workers. It provides a modern web interface for managing emails, enhanced by AI capabilities for reading, searching, and drafting replies. This project leverages Cloudflare's robust ecosystem, including Email Routing, Durable Objects, R2, and Workers AI, to deliver a powerful and secure email solution.

TypeScriptEmail ClientAI Agent
BrowserOS: The Open-Source Agentic Browser for AI-Powered Web Automation

BrowserOS: The Open-Source Agentic Browser for AI-Powered Web Automation

April 3, 2026

BrowserOS is an innovative open-source Chromium fork designed to natively run AI agents, offering a privacy-first alternative to other AI browsers. It allows users to automate web tasks with natural language, integrate with various LLM providers, and maintain control over their data. This project combines a full-featured browser with powerful AI capabilities for enhanced productivity and privacy.

BrowserOSAI AgentOpen Source

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️