心流logo

OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models

OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models

OpenWebAgent is an open-source toolkit designed to optimize web automation by integrating both large language models (LLMs) and large multimodal models (LMMs). This toolkit is particularly focused on enhancing human-computer interactions on the web, simplifying complex tasks through an advanced HTML parser, a rapid action generation module, and an intuitive user interface.

Key Features

  1. Modular Design: The core of OpenWebAgent is an innovative web agent framework that uses a modular design. This allows developers to seamlessly integrate a variety of models and tools to process web information and automate tasks on the web.

  2. Integration of LLMs and LMMs: By combining the capabilities of large language models and large multimodal models, OpenWebAgent is able to handle a wide range of tasks on the web, including understanding user intent, processing complex web structures, and generating appropriate actions.

  3. Advanced HTML Parser: The toolkit includes an advanced HTML parser that can efficiently navigate and extract information from web pages, enabling the automation of complex web tasks.

  4. Rapid Action Generation Module: OpenWebAgent includes a module that can rapidly generate actions based on the user's intent and the current state of the web page. This enables the development of powerful, task-oriented web agents.

  5. Intuitive User Interface: The toolkit comes with an intuitive user interface that makes it easy for users to interact with the web agents and automate tasks on the web.

Applications

OpenWebAgent has a wide range of applications, including:

Availability

The OpenWebAgent framework, Chrome plugin, and demo video are available at the following GitHub repository:

For more information, you can also refer to the paper presented at the ACL 2024 System Demonstration Track: