{"name":"OpenWebAgent: An Open Toolkit for LLM- and LMM-based Web Agents","description":"OpenWebAgent is an open toolkit designed to empower model-based web agents, streamlining human-computer interactions by automating tasks on webpages. It offers a convenient framework for developing LLM- and LMM-based web agents, providing both plugin and server source code for easy integration and customization. This project was featured as an ACL'24 Demo, showcasing its innovative approach to web automation.","github":"https://github.com/THUDM/OpenWebAgent","url":"https://osrepos.com/repo/thudm-openwebagent","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/thudm-openwebagent","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/thudm-openwebagent.md","json":"https://osrepos.com/repo/thudm-openwebagent.json","topics":["JavaScript","Web Agent","LLM","LMM","AI","Automation","Browser Extension","Toolkit"],"keywords":["JavaScript","Web Agent","LLM","LMM","AI","Automation","Browser Extension","Toolkit"],"stars":null,"summary":"OpenWebAgent is an open toolkit designed to empower model-based web agents, streamlining human-computer interactions by automating tasks on webpages. It offers a convenient framework for developing LLM- and LMM-based web agents, providing both plugin and server source code for easy integration and customization. This project was featured as an ACL'24 Demo, showcasing its innovative approach to web automation.","content":"## Introduction\n\nOpenWebAgent is an innovative open toolkit designed to facilitate the development of LLM- and LMM-based web agents. It aims to streamline human-computer interactions by enabling these agents to automate various tasks directly on webpages. This project, featured as an ACL'24 Demo, provides both plugin and server source code, allowing users to easily integrate their own models into the backend to create a functional web browsing agent. Key features include a high-performance HTML parser, a unique interaction workflow, and a streamlined user interface.\n\n## Installation\n\nSetting up OpenWebAgent involves configuring both a browser plugin and a backend server.\n\n### Plugin Setup\n\nTo get the browser plugin running:\n\n1.  You can download the `extension.zip` file and unzip it to add it directly to your Chrome browser.\n2.  Alternatively, if you wish to modify the source code, navigate to the `plugin` directory and install dependencies:\n    sh\n    cd plugin\n    npm install\n    \n3.  Then, build the extension:\n    sh\n    npm run build\n    \n    This will create an `openwebagent-extension` folder, which you can install as an unpacked plugin in Chrome.\n4.  For more detailed instructions, refer to the `README.md` located in the `plugin/` directory.\n\n### Server Setup\n\nTo set up the backend server:\n\n1.  Configure `config/server_config.yaml` to specify your planner arguments and model. For example:\n    yaml\n    planner_args:\n      provider: \"openai\"\n      model: \"gpt-4-turbo-2024-04-09\"\n      n_workers: 2\n    \n2.  Configure your MongoDB Atlas. You can also save data locally, but remember to update `config/mongo_config.yaml` accordingly:\n    yaml\n    mongo_args:\n      base_url: \"<your-url>\"\n      dbname: \"<your-db-name>\"\n      username: \"<your-username>\"\n    \n3.  Add your API keys to the `.env` file:\n    yaml\n    OPENAI_KEY=\"<your-token>\"\n    LOG_DB_PASSWD=\"<your-db-password>\"\n    OPENAI_API_URL=\"<your-openai-url>\"  # optional\n    \n4.  Download the required server packages:\n    bash\n    cd server\n    bash setup.sh\n    \n5.  Finally, start the server:\n    shell\n    python agent/run_server.py\n    \n6.  For further details, consult the `README.md` in the `server/` directory.\n\n## Examples\n\nOnce both the plugin and server are configured, OpenWebAgent allows you to automate complex tasks on webpages. The ready-to-use plugin integrates seamlessly with your browser, enabling the agent to interpret user intent, process web content, and execute actions. This powerful combination streamlines interactions, making web-based tasks more efficient and automated.\n\n## Why Use It\n\nOpenWebAgent stands out for several compelling reasons:\n\n*   **High-Performance HTML Parser**: It simplifies complex HTML structures, significantly boosting document processing speed and accuracy for the agent.\n*   **Unique Interaction Workflow**: The modular workflow effectively integrates user intent, action history, and parsed HTML, ensuring coherent actions and facilitating easy integration of various models.\n*   **Streamlined User Interface**: The toolkit offers an intuitive, ready-to-use interface where users can effortlessly track processes and control tasks with minimal setup.\n*   **Open Toolkit**: Its open nature allows developers to easily incorporate their own LLM or LMM models, making it highly adaptable and customizable for specific needs.\n\n## Links\n\n*   **GitHub Repository**: [https://github.com/THUDM/OpenWebAgent](https://github.com/THUDM/OpenWebAgent){target=\"_blank\"}\n\nIf you find OpenWebAgent useful, please consider citing their paper:\n\n\n@inproceedings{iong2024openwebagent,\n    title     = {OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models},\n    author    = {Iat Long Iong and Xiao Liu and Yuxuan Chen and Hanyu Lai and Shuntian Yao and Pengbo Shen and Hao Yu and Yuxiao Dong and Jie Tang},\n    booktitle = {ACL 2024 System Demonstration Track},\n    year      = {2024}\n}","metrics":{"detailViews":0,"githubClicks":0},"dates":{"published":null,"modified":"2026-07-02T08:24:52.000Z"}}