Agent-S: Open Agentic Framework for Human-like Computer Use

Agent-S: Open Agentic Framework for Human-like Computer Use

Summary

Agent-S is an open agentic framework designed to enable autonomous interaction with computers, allowing AI agents to use machines like humans. It provides intelligent GUI agents that learn from past experiences to perform complex tasks. This framework is a cutting-edge solution for AI automation and advanced agent-based systems.

Repository Info

Updated on December 15, 2025
View on GitHub

Introduction

Agent-S is an innovative open-source framework from Simular AI, designed to empower AI agents to interact with computers autonomously, much like a human user. At its core, Agent-S aims to build intelligent GUI agents capable of learning from past experiences and executing complex tasks across various operating systems, including Windows, macOS, and Linux.

The framework has achieved state-of-the-art results on challenging benchmarks like OSWorld, WindowsAgentArena, and AndroidWorld, with its latest iteration, Agent S3, demonstrating performance approaching human-level accuracy. Whether you are interested in advanced AI, automation, or contributing to cutting-edge agent-based systems, Agent-S offers a robust and flexible platform.

For more details, visit the Agent-S GitHub repository.

Installation

Getting started with Agent-S is straightforward. Follow these steps to set up the framework on your machine.

Prerequisites

  • Single Monitor: Agent-S is optimized for single monitor setups.
  • Security: The agent executes Python code to control your computer, so use it with caution in trusted environments.
  • Supported Platforms: Agent-S supports Linux, macOS, and Windows.

Installation Steps

To install Agent S3 without cloning the repository, use pip:

pip install gui-agents

If you plan to contribute or test changes, clone the repository and install in editable mode:

pip install -e .

Additionally, pytesseract requires Tesseract OCR to be installed:

brew install tesseract

API Configuration

You need to configure your API keys for the language models. Choose one of the following methods:

Option 1: Environment Variables

Add your API keys to your shell configuration file (e.g., .bashrc or .zshrc):

export OPENAI_API_KEY=<YOUR_API_KEY>
export ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_API_KEY>
export HF_TOKEN=<YOUR_HF_TOKEN>

Option 2: Python Script

Set environment variables within your Python script:

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

Agent-S supports various models including Azure OpenAI, Anthropic, Gemini, Open Router, and vLLM inference. For optimal performance, it is recommended to use UI-TARS-1.5-7B as the grounding model.

Examples

Agent-S can be run via a command-line interface (CLI) or integrated into your Python projects using its SDK.

CLI Usage

The recommended setup for Agent S3 involves using OpenAI gpt-5-2025-08-07 as the main model, paired with UI-TARS-1.5-7B for grounding.

Run Agent S3 with the required parameters:

agent_s \
    --provider openai \
    --model gpt-5-2025-08-07 \
    --ground_provider huggingface \
    --ground_url http://localhost:8080 \
    --ground_model ui-tars-1.5-7b \
    --grounding_width 1920 \
    --grounding_height 1080

Local Coding Environment (Optional)

For tasks requiring code execution, enable the local coding environment:

agent_s \
    --provider openai \
    --model gpt-5-2025-08-07 \
    --ground_provider huggingface \
    --ground_url http://localhost:8080 \
    --ground_model ui-tars-1.5-7b \
    --grounding_width 1920 \
    --grounding_height 1080 \
    --enable_local_env

Warning: The local coding environment executes arbitrary Python and Bash code locally. Use this feature only in trusted environments and with trusted inputs.

SDK Usage Snippet

Here's a brief example of how to use the gui_agents SDK to query the agent:

import pyautogui
import io
from gui_agents.s3.agents.agent_s import AgentS3
from gui_agents.s3.agents.grounding import OSWorldACI

# ... (engine_params and grounding_engine_params setup as per README) ...

grounding_agent = OSWorldACI(
    # ... parameters ...
)

agent = AgentS3(
    # ... parameters ...
)

# Get screenshot.
screenshot = pyautogui.screenshot()
buffered = io.BytesIO()
screenshot.save(buffered, format="PNG")
screenshot_bytes = buffered.getvalue()

obs = {
  "screenshot": screenshot_bytes,
}

instruction = "Close VS Code"
info, action = agent.predict(instruction=instruction, observation=obs)

exec(action[0])

Why Use Agent-S?

Agent-S stands out as a powerful tool for several reasons:

  • Human-like Computer Interaction: It enables AI agents to understand and interact with graphical user interfaces (GUIs) in a way that mimics human behavior, bridging the gap between AI and computer use.
  • State-of-the-Art Performance: With Agent S3, the framework achieves leading results on benchmarks like OSWorld, WindowsAgentArena, and AndroidWorld, demonstrating strong generalization capabilities.
  • Open and Extensible Framework: Being open-source, Agent-S provides a flexible foundation for researchers and developers to build upon, customize, and integrate into their own projects.
  • Multi-Platform Support: It runs seamlessly across Windows, macOS, and Linux, making it versatile for various environments.
  • Advanced Agentic Capabilities: Features like reflection agents and an optional local coding environment enhance the agent's ability to plan, execute, and debug complex tasks.
  • Flexible Model Integration: Supports a wide range of LLM providers and grounding models, allowing users to choose the best fit for their needs.

Links