# UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/bytedance-ui-tars-desktop
Generated for open source discovery and AI-assisted research.

UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.

GitHub: https://github.com/bytedance/UI-TARS-desktop
OSRepos URL: https://osrepos.com/repo/bytedance-ui-tars-desktop

## Summary

UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.

## Topics

- AI Agent
- Multimodal AI
- GUI Automation
- TypeScript
- Desktop Application
- Browser Automation
- Vision Language Model
- Open Source

## Repository Information

Last analyzed by OSRepos: Wed May 06 2026 17:33:40 GMT+0100 (Western European Summer Time)
Detail views: 2
GitHub clicks: 5

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction
UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It encompasses two primary projects: Agent TARS, a versatile multimodal AI agent, and UI-TARS Desktop, a native GUI agent. This powerful stack aims to facilitate human-like task completion through rich multimodal capabilities, including GUI agent and vision, and seamless integration with various real-world tools.

## Installation
Getting started with Agent TARS CLI is straightforward. Ensure you have Node.js version 22 or higher installed.

You can launch it directly using `npx`:
bash
npx @agent-tars/cli@latest


Alternatively, install it globally:
bash
npm install @agent-tars/cli@latest -g


Then, run it with your preferred model provider and API key:
bash
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

For comprehensive setup instructions, refer to the [Quick Start guide](https://agent-tars.com/guide/get-started/quick-start.html "Quick Start Guide").

## Examples
UI-TARS-desktop demonstrates powerful automation capabilities across various scenarios.

### Agent TARS Showcase
*   **Flight Booking:** Automate complex tasks like booking flights.
    *   **Instruction:** "Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline."
    *   [Watch video example](https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8 "Flight Booking Demo")
*   **Hotel Booking & Information Gathering:**
    *   **Instruction:** "I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me."
    *   [Watch video example](https://github.com/user-attachments/assets/c9489936-afdc-4d12-adda-d4b90d2a869d "Hotel Booking Demo")
*   **Chart Generation:** Integrate with external tools for data visualization.
    *   **Instruction:** "Draw me a chart of Hangzhou's weather for one month."
    *   [Watch video example](https://github.com/user-attachments/assets/a9fd72d0-01bb-4233-aa27-ca95194bbce9 "Chart Generation Demo")

For more use cases, explore the [GitHub Issues showcase](https://github.com/bytedance/UI-TARS-desktop/issues/842 "More Use Cases").

### UI-TARS Desktop Showcase
*   **VS Code Configuration:** Control desktop applications with natural language.
    *   **Instruction:** "Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting."
    *   [Watch local operator video](https://github.com/user-attachments/assets/e0914ce9-ad33-494b-bdec-0c25c1b01a27 "VS Code Local Operator Demo")
    *   [Watch remote operator video](https://github.com/user-attachments/assets/01e49b69-7070-46c8-b3e3-2aaaaec71800 "VS Code Remote Operator Demo")
*   **GitHub Project Inquiry:** Interact with web interfaces.
    *   **Instruction:** "Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?"
    *   [Watch local operator video](https://github.com/user-attachments/assets/3d159f54-d24a-4268-96c0-e149607e9199 "GitHub Local Operator Demo")
    *   [Watch remote operator video](https://github.com/user-attachments/assets/072fb72d-7394-4bfa-95f5-4736e29f7e58 "GitHub Remote Operator Demo")

## Why Use UI-TARS-desktop?
UI-TARS-desktop offers a comprehensive solution for AI-driven automation, providing:
*   **One-Click Out-of-the-box CLI:** Easy execution with both headful Web UI and headless server options.
*   **Hybrid Browser Agent:** Flexible browser control using GUI Agent, DOM, or a combined strategy.
*   **Event Stream:** Protocol-driven event stream for context engineering and agent UI development.
*   **MCP Integration:** Kernel built on MCP, supporting the mounting of MCP Servers for real-world tool connections.
*   **Natural Language Control:** Powered by Vision-Language Models for intuitive interaction.
*   **Screenshot and Visual Recognition:** Enhanced understanding of interfaces through visual capabilities.
*   **Precise Control:** Accurate mouse and keyboard control for detailed operations.
*   **Cross-Platform Support:** Compatibility across Windows, MacOS, and browser environments.
*   **Real-time Feedback:** Immediate status display for ongoing tasks.
*   **Private and Secure:** Fully local processing ensures data privacy and security.

## Links
*   **GitHub Repository:** [https://github.com/bytedance/UI-TARS-desktop](https://github.com/bytedance/UI-TARS-desktop "GitHub Repository")
*   **Official Website:** [https://agent-tars.com](https://agent-tars.com "Official Website")
*   **Agent TARS Quick Start:** [https://agent-tars.com/guide/get-started/quick-start.html](https://agent-tars.com/guide/get-started/quick-start.html "Agent TARS Quick Start")
*   **UI-TARS Desktop Quick Start:** [https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md](https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md "UI-TARS Desktop Quick Start")
*   **Comprehensive Documentation:** [https://agent-tars.com/guide/get-started/introduction.html](https://agent-tars.com/guide/get-started/introduction.html "Comprehensive Documentation")
*   **Hugging Face Models:** [https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B "Hugging Face Models")
*   **Research Paper (arXiv):** [https://arxiv.org/abs/2501.12326](https://arxiv.org/abs/2501.12326 "Research Paper")