# OpenWebAgent: An Open Toolkit for LLM- and LMM-based Web Agents

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/thudm-openwebagent
Generated for open source discovery and AI-assisted research.

OpenWebAgent is an open toolkit designed to empower model-based web agents, streamlining human-computer interactions by automating tasks on webpages. It offers a convenient framework for developing LLM- and LMM-based web agents, providing both plugin and server source code for easy integration and customization. This project was featured as an ACL'24 Demo, showcasing its innovative approach to web automation.

GitHub: https://github.com/THUDM/OpenWebAgent
OSRepos URL: https://osrepos.com/repo/thudm-openwebagent

## Summary

OpenWebAgent is an open toolkit designed to empower model-based web agents, streamlining human-computer interactions by automating tasks on webpages. It offers a convenient framework for developing LLM- and LMM-based web agents, providing both plugin and server source code for easy integration and customization. This project was featured as an ACL'24 Demo, showcasing its innovative approach to web automation.

## Topics

- JavaScript
- Web Agent
- LLM
- LMM
- AI
- Automation
- Browser Extension
- Toolkit

## Repository Information

Last analyzed by OSRepos: Thu Jul 02 2026 09:24:52 GMT+0100 (Western European Summer Time)
Detail views: 0
GitHub clicks: 0

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

OpenWebAgent is an innovative open toolkit designed to facilitate the development of LLM- and LMM-based web agents. It aims to streamline human-computer interactions by enabling these agents to automate various tasks directly on webpages. This project, featured as an ACL'24 Demo, provides both plugin and server source code, allowing users to easily integrate their own models into the backend to create a functional web browsing agent. Key features include a high-performance HTML parser, a unique interaction workflow, and a streamlined user interface.

## Installation

Setting up OpenWebAgent involves configuring both a browser plugin and a backend server.

### Plugin Setup

To get the browser plugin running:

1.  You can download the `extension.zip` file and unzip it to add it directly to your Chrome browser.
2.  Alternatively, if you wish to modify the source code, navigate to the `plugin` directory and install dependencies:
    sh
    cd plugin
    npm install
    
3.  Then, build the extension:
    sh
    npm run build
    
    This will create an `openwebagent-extension` folder, which you can install as an unpacked plugin in Chrome.
4.  For more detailed instructions, refer to the `README.md` located in the `plugin/` directory.

### Server Setup

To set up the backend server:

1.  Configure `config/server_config.yaml` to specify your planner arguments and model. For example:
    yaml
    planner_args:
      provider: "openai"
      model: "gpt-4-turbo-2024-04-09"
      n_workers: 2
    
2.  Configure your MongoDB Atlas. You can also save data locally, but remember to update `config/mongo_config.yaml` accordingly:
    yaml
    mongo_args:
      base_url: "<your-url>"
      dbname: "<your-db-name>"
      username: "<your-username>"
    
3.  Add your API keys to the `.env` file:
    yaml
    OPENAI_KEY="<your-token>"
    LOG_DB_PASSWD="<your-db-password>"
    OPENAI_API_URL="<your-openai-url>"  # optional
    
4.  Download the required server packages:
    bash
    cd server
    bash setup.sh
    
5.  Finally, start the server:
    shell
    python agent/run_server.py
    
6.  For further details, consult the `README.md` in the `server/` directory.

## Examples

Once both the plugin and server are configured, OpenWebAgent allows you to automate complex tasks on webpages. The ready-to-use plugin integrates seamlessly with your browser, enabling the agent to interpret user intent, process web content, and execute actions. This powerful combination streamlines interactions, making web-based tasks more efficient and automated.

## Why Use It

OpenWebAgent stands out for several compelling reasons:

*   **High-Performance HTML Parser**: It simplifies complex HTML structures, significantly boosting document processing speed and accuracy for the agent.
*   **Unique Interaction Workflow**: The modular workflow effectively integrates user intent, action history, and parsed HTML, ensuring coherent actions and facilitating easy integration of various models.
*   **Streamlined User Interface**: The toolkit offers an intuitive, ready-to-use interface where users can effortlessly track processes and control tasks with minimal setup.
*   **Open Toolkit**: Its open nature allows developers to easily incorporate their own LLM or LMM models, making it highly adaptable and customizable for specific needs.

## Links

*   **GitHub Repository**: [https://github.com/THUDM/OpenWebAgent](https://github.com/THUDM/OpenWebAgent){target="_blank"}

If you find OpenWebAgent useful, please consider citing their paper:


@inproceedings{iong2024openwebagent,
    title     = {OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models},
    author    = {Iat Long Iong and Xiao Liu and Yuxuan Chen and Hanyu Lai and Shuntian Yao and Pengbo Shen and Hao Yu and Yuxiao Dong and Jie Tang},
    booktitle = {ACL 2024 System Demonstration Track},
    year      = {2024}
}