OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models
OpenWebAgent is an open-source toolkit designed to optimize web automation by integrating both large language models (LLMs) and large multimodal models (LMMs). This toolkit is particularly focused on enhancing human-computer interactions on the web, simplifying complex tasks through an advanced HTML parser, a rapid action generation module, and an intuitive user interface.
Modular Design: The core of OpenWebAgent is an innovative web agent framework that uses a modular design. This allows developers to seamlessly integrate a variety of models and tools to process web information and automate tasks on the web.
Integration of LLMs and LMMs: By combining the capabilities of large language models and large multimodal models, OpenWebAgent is able to handle a wide range of tasks on the web, including understanding user intent, processing complex web structures, and generating appropriate actions.
Advanced HTML Parser: The toolkit includes an advanced HTML parser that can efficiently navigate and extract information from web pages, enabling the automation of complex web tasks.
Rapid Action Generation Module: OpenWebAgent includes a module that can rapidly generate actions based on the user's intent and the current state of the web page. This enables the development of powerful, task-oriented web agents.
Intuitive User Interface: The toolkit comes with an intuitive user interface that makes it easy for users to interact with the web agents and automate tasks on the web.
OpenWebAgent has a wide range of applications, including:
Automated Data Collection: By automating the process of extracting data from web pages, OpenWebAgent can significantly reduce the time and effort required for data collection.
Automated Form Filling: The toolkit can automate the process of filling out forms on websites, making it easier for users to complete online tasks.
Automated Web Testing: OpenWebAgent can be used to automate the testing of websites, ensuring that they are functioning correctly and meeting user needs.
The OpenWebAgent framework, Chrome plugin, and demo video are available at the following GitHub repository:
For more information, you can also refer to the paper presented at the ACL 2024 System Demonstration Track: