UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It provides both Agent TARS, a general multimodal AI agent with CLI and Web UI, and UI-TARS Desktop, a native GUI agent for local and remote computer/browser control. This powerful tool aims to enable human-like task completion through rich multimodal capabilities and seamless integration with real-world tools.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
UI-TARS-desktop is an open-source multimodal AI Agent stack from ByteDance, designed to connect cutting-edge AI models with agent infrastructure. It encompasses two primary projects: Agent TARS, a versatile multimodal AI agent, and UI-TARS Desktop, a native GUI agent. This powerful stack aims to facilitate human-like task completion through rich multimodal capabilities, including GUI agent and vision, and seamless integration with various real-world tools.
Installation
Getting started with Agent TARS CLI is straightforward. Ensure you have Node.js version 22 or higher installed.
You can launch it directly using npx:
npx @agent-tars/cli@latest
Alternatively, install it globally:
npm install @agent-tars/cli@latest -g
Then, run it with your preferred model provider and API key:
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
For comprehensive setup instructions, refer to the Quick Start guide.
Examples
UI-TARS-desktop demonstrates powerful automation capabilities across various scenarios.
Agent TARS Showcase
- Flight Booking: Automate complex tasks like booking flights.
- Instruction: "Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline."
- Watch video example
- Hotel Booking & Information Gathering:
- Instruction: "I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me."
- Watch video example
- Chart Generation: Integrate with external tools for data visualization.
- Instruction: "Draw me a chart of Hangzhou's weather for one month."
- Watch video example
For more use cases, explore the GitHub Issues showcase.
UI-TARS Desktop Showcase
- VS Code Configuration: Control desktop applications with natural language.
- Instruction: "Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting."
- Watch local operator video
- Watch remote operator video
- GitHub Project Inquiry: Interact with web interfaces.
- Instruction: "Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?"
- Watch local operator video
- Watch remote operator video
Why Use UI-TARS-desktop?
UI-TARS-desktop offers a comprehensive solution for AI-driven automation, providing:
- One-Click Out-of-the-box CLI: Easy execution with both headful Web UI and headless server options.
- Hybrid Browser Agent: Flexible browser control using GUI Agent, DOM, or a combined strategy.
- Event Stream: Protocol-driven event stream for context engineering and agent UI development.
- MCP Integration: Kernel built on MCP, supporting the mounting of MCP Servers for real-world tool connections.
- Natural Language Control: Powered by Vision-Language Models for intuitive interaction.
- Screenshot and Visual Recognition: Enhanced understanding of interfaces through visual capabilities.
- Precise Control: Accurate mouse and keyboard control for detailed operations.
- Cross-Platform Support: Compatibility across Windows, MacOS, and browser environments.
- Real-time Feedback: Immediate status display for ongoing tasks.
- Private and Secure: Fully local processing ensures data privacy and security.
Links
- GitHub Repository: https://github.com/bytedance/UI-TARS-desktop
- Official Website: https://agent-tars.com
- Agent TARS Quick Start: https://agent-tars.com/guide/get-started/quick-start.html
- UI-TARS Desktop Quick Start: https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md
- Comprehensive Documentation: https://agent-tars.com/guide/get-started/introduction.html
- Hugging Face Models: https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B
- Research Paper (arXiv): https://arxiv.org/abs/2501.12326
Related repositories
Similar repositories that may be relevant next.

Feynman: The Open Source AI Research Agent
June 2, 2026
Feynman is an open-source AI research agent designed to automate and streamline complex research tasks. Built with TypeScript, it leverages multiple agents and tools to conduct in-depth investigations, literature reviews, and even experiment replications, providing source-grounded outputs.

CodeGraph: Supercharge AI Coding Agents with Semantic Code Intelligence
May 25, 2026
CodeGraph is a powerful, pre-indexed code knowledge graph designed to enhance AI coding agents like Claude Code, Cursor, and Codex. It significantly reduces token usage and tool calls, offering a faster and more cost-effective way for agents to understand codebases. This 100% local solution provides semantic code intelligence, improving agent efficiency and accuracy.

Agentic Inbox: A Self-Hosted Email Client with AI on Cloudflare Workers
May 18, 2026
Agentic Inbox is an innovative self-hosted email client that integrates an AI agent, running entirely on Cloudflare Workers. It provides a modern web interface for managing emails, enhanced by AI capabilities for reading, searching, and drafting replies. This project leverages Cloudflare's robust ecosystem, including Email Routing, Durable Objects, R2, and Workers AI, to deliver a powerful and secure email solution.

BrowserOS: The Open-Source Agentic Browser for AI-Powered Web Automation
April 3, 2026
BrowserOS is an innovative open-source Chromium fork designed to natively run AI agents, offering a privacy-first alternative to other AI browsers. It allows users to automate web tasks with natural language, integrate with various LLM providers, and maintain control over their data. This project combines a full-featured browser with powerful AI capabilities for enhanced productivity and privacy.
Source repository
Open the original repository on GitHub.