UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack

Summary
UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It encompasses two primary projects: Agent TARS, a versatile multimodal AI agent, and UI-TARS Desktop, a native GUI agent. This powerful stack aims to facilitate human-like task completion through rich multimodal capabilities, including GUI agent and vision, and seamless integration with various real-world tools.
Installation
Getting started with Agent TARS CLI is straightforward. Ensure you have Node.js version 22 or higher installed.
You can launch it directly using npx:
npx @agent-tars/cli@latest
Alternatively, install it globally:
npm install @agent-tars/cli@latest -g
Then, run it with your preferred model provider and API key:
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
For comprehensive setup instructions, refer to the Quick Start guide.
Examples
UI-TARS-desktop demonstrates powerful automation capabilities across various scenarios.
Agent TARS Showcase
- Flight Booking: Automate complex tasks like booking flights.
- Instruction: "Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline."
- Watch video example
- Hotel Booking & Information Gathering:
- Instruction: "I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me."
- Watch video example
- Chart Generation: Integrate with external tools for data visualization.
- Instruction: "Draw me a chart of Hangzhou's weather for one month."
- Watch video example
For more use cases, explore the GitHub Issues showcase.
UI-TARS Desktop Showcase
- VS Code Configuration: Control desktop applications with natural language.
- Instruction: "Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting."
- Watch local operator video
- Watch remote operator video
- GitHub Project Inquiry: Interact with web interfaces.
- Instruction: "Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?"
- Watch local operator video
- Watch remote operator video
Why Use UI-TARS-desktop?
UI-TARS-desktop offers a comprehensive solution for AI-driven automation, providing:
- One-Click Out-of-the-box CLI: Easy execution with both headful Web UI and headless server options.
- Hybrid Browser Agent: Flexible browser control using GUI Agent, DOM, or a combined strategy.
- Event Stream: Protocol-driven event stream for context engineering and agent UI development.
- MCP Integration: Kernel built on MCP, supporting the mounting of MCP Servers for real-world tool connections.
- Natural Language Control: Powered by Vision-Language Models for intuitive interaction.
- Screenshot and Visual Recognition: Enhanced understanding of interfaces through visual capabilities.
- Precise Control: Accurate mouse and keyboard control for detailed operations.
- Cross-Platform Support: Compatibility across Windows, MacOS, and browser environments.
- Real-time Feedback: Immediate status display for ongoing tasks.
- Private and Secure: Fully local processing ensures data privacy and security.
Links
- GitHub Repository: https://github.com/bytedance/UI-TARS-desktop
- Official Website: https://agent-tars.com
- Agent TARS Quick Start: https://agent-tars.com/guide/get-started/quick-start.html
- UI-TARS Desktop Quick Start: https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md
- Comprehensive Documentation: https://agent-tars.com/guide/get-started/introduction.html
- Hugging Face Models: https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B
- Research Paper (arXiv): https://arxiv.org/abs/2501.12326