UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack

UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack

Summary

UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.

Repository Info

Updated on May 6, 2026
View on GitHub

Introduction

UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It encompasses two primary projects: Agent TARS, a versatile multimodal AI agent, and UI-TARS Desktop, a native GUI agent. This powerful stack aims to facilitate human-like task completion through rich multimodal capabilities, including GUI agent and vision, and seamless integration with various real-world tools.

Installation

Getting started with Agent TARS CLI is straightforward. Ensure you have Node.js version 22 or higher installed.

You can launch it directly using npx:

npx @agent-tars/cli@latest

Alternatively, install it globally:

npm install @agent-tars/cli@latest -g

Then, run it with your preferred model provider and API key:

agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

For comprehensive setup instructions, refer to the Quick Start guide.

Examples

UI-TARS-desktop demonstrates powerful automation capabilities across various scenarios.

Agent TARS Showcase

  • Flight Booking: Automate complex tasks like booking flights.
    • Instruction: "Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline."
    • Watch video example
  • Hotel Booking & Information Gathering:
    • Instruction: "I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me."
    • Watch video example
  • Chart Generation: Integrate with external tools for data visualization.

For more use cases, explore the GitHub Issues showcase.

UI-TARS Desktop Showcase

Why Use UI-TARS-desktop?

UI-TARS-desktop offers a comprehensive solution for AI-driven automation, providing:

  • One-Click Out-of-the-box CLI: Easy execution with both headful Web UI and headless server options.
  • Hybrid Browser Agent: Flexible browser control using GUI Agent, DOM, or a combined strategy.
  • Event Stream: Protocol-driven event stream for context engineering and agent UI development.
  • MCP Integration: Kernel built on MCP, supporting the mounting of MCP Servers for real-world tool connections.
  • Natural Language Control: Powered by Vision-Language Models for intuitive interaction.
  • Screenshot and Visual Recognition: Enhanced understanding of interfaces through visual capabilities.
  • Precise Control: Accurate mouse and keyboard control for detailed operations.
  • Cross-Platform Support: Compatibility across Windows, MacOS, and browser environments.
  • Real-time Feedback: Immediate status display for ongoing tasks.
  • Private and Secure: Fully local processing ensures data privacy and security.

Links