{"name":"UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack","description":"UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.","github":"https://github.com/bytedance/UI-TARS-desktop","url":"https://osrepos.com/repo/bytedance-ui-tars-desktop","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/bytedance-ui-tars-desktop","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/bytedance-ui-tars-desktop.md","json":"https://osrepos.com/repo/bytedance-ui-tars-desktop.json","topics":["AI Agent","Multimodal AI","GUI Automation","TypeScript","Desktop Application","Browser Automation","Vision Language Model","Open Source"],"keywords":["AI Agent","Multimodal AI","GUI Automation","TypeScript","Desktop Application","Browser Automation","Vision Language Model","Open Source"],"stars":null,"summary":"UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.","content":"## Introduction\nUI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It encompasses two primary projects: Agent TARS, a versatile multimodal AI agent, and UI-TARS Desktop, a native GUI agent. This powerful stack aims to facilitate human-like task completion through rich multimodal capabilities, including GUI agent and vision, and seamless integration with various real-world tools.\n\n## Installation\nGetting started with Agent TARS CLI is straightforward. Ensure you have Node.js version 22 or higher installed.\n\nYou can launch it directly using `npx`:\nbash\nnpx @agent-tars/cli@latest\n\n\nAlternatively, install it globally:\nbash\nnpm install @agent-tars/cli@latest -g\n\n\nThen, run it with your preferred model provider and API key:\nbash\nagent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key\nagent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key\n\nFor comprehensive setup instructions, refer to the [Quick Start guide](https://agent-tars.com/guide/get-started/quick-start.html \"Quick Start Guide\").\n\n## Examples\nUI-TARS-desktop demonstrates powerful automation capabilities across various scenarios.\n\n### Agent TARS Showcase\n*   **Flight Booking:** Automate complex tasks like booking flights.\n    *   **Instruction:** \"Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline.\"\n    *   [Watch video example](https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8 \"Flight Booking Demo\")\n*   **Hotel Booking & Information Gathering:**\n    *   **Instruction:** \"I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me.\"\n    *   [Watch video example](https://github.com/user-attachments/assets/c9489936-afdc-4d12-adda-d4b90d2a869d \"Hotel Booking Demo\")\n*   **Chart Generation:** Integrate with external tools for data visualization.\n    *   **Instruction:** \"Draw me a chart of Hangzhou's weather for one month.\"\n    *   [Watch video example](https://github.com/user-attachments/assets/a9fd72d0-01bb-4233-aa27-ca95194bbce9 \"Chart Generation Demo\")\n\nFor more use cases, explore the [GitHub Issues showcase](https://github.com/bytedance/UI-TARS-desktop/issues/842 \"More Use Cases\").\n\n### UI-TARS Desktop Showcase\n*   **VS Code Configuration:** Control desktop applications with natural language.\n    *   **Instruction:** \"Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting.\"\n    *   [Watch local operator video](https://github.com/user-attachments/assets/e0914ce9-ad33-494b-bdec-0c25c1b01a27 \"VS Code Local Operator Demo\")\n    *   [Watch remote operator video](https://github.com/user-attachments/assets/01e49b69-7070-46c8-b3e3-2aaaaec71800 \"VS Code Remote Operator Demo\")\n*   **GitHub Project Inquiry:** Interact with web interfaces.\n    *   **Instruction:** \"Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?\"\n    *   [Watch local operator video](https://github.com/user-attachments/assets/3d159f54-d24a-4268-96c0-e149607e9199 \"GitHub Local Operator Demo\")\n    *   [Watch remote operator video](https://github.com/user-attachments/assets/072fb72d-7394-4bfa-95f5-4736e29f7e58 \"GitHub Remote Operator Demo\")\n\n## Why Use UI-TARS-desktop?\nUI-TARS-desktop offers a comprehensive solution for AI-driven automation, providing:\n*   **One-Click Out-of-the-box CLI:** Easy execution with both headful Web UI and headless server options.\n*   **Hybrid Browser Agent:** Flexible browser control using GUI Agent, DOM, or a combined strategy.\n*   **Event Stream:** Protocol-driven event stream for context engineering and agent UI development.\n*   **MCP Integration:** Kernel built on MCP, supporting the mounting of MCP Servers for real-world tool connections.\n*   **Natural Language Control:** Powered by Vision-Language Models for intuitive interaction.\n*   **Screenshot and Visual Recognition:** Enhanced understanding of interfaces through visual capabilities.\n*   **Precise Control:** Accurate mouse and keyboard control for detailed operations.\n*   **Cross-Platform Support:** Compatibility across Windows, MacOS, and browser environments.\n*   **Real-time Feedback:** Immediate status display for ongoing tasks.\n*   **Private and Secure:** Fully local processing ensures data privacy and security.\n\n## Links\n*   **GitHub Repository:** [https://github.com/bytedance/UI-TARS-desktop](https://github.com/bytedance/UI-TARS-desktop \"GitHub Repository\")\n*   **Official Website:** [https://agent-tars.com](https://agent-tars.com \"Official Website\")\n*   **Agent TARS Quick Start:** [https://agent-tars.com/guide/get-started/quick-start.html](https://agent-tars.com/guide/get-started/quick-start.html \"Agent TARS Quick Start\")\n*   **UI-TARS Desktop Quick Start:** [https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md](https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md \"UI-TARS Desktop Quick Start\")\n*   **Comprehensive Documentation:** [https://agent-tars.com/guide/get-started/introduction.html](https://agent-tars.com/guide/get-started/introduction.html \"Comprehensive Documentation\")\n*   **Hugging Face Models:** [https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B \"Hugging Face Models\")\n*   **Research Paper (arXiv):** [https://arxiv.org/abs/2501.12326](https://arxiv.org/abs/2501.12326 \"Research Paper\")","metrics":{"detailViews":2,"githubClicks":4},"dates":{"published":null,"modified":"2026-05-06T16:33:40.000Z"}}