This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
Created by
Haebom
Author
Tao Yu, Zhengbo Zhang, Zhiheng Lyu, Junhao Gong, Hongzhu Yi, Xinming Wang, Yuxuan Zhou, Jiabing Yang, Ping Nie, Yan Huang, Wenhu Chen
Outline
This paper focuses on improving the ability of LLMs to interact with dynamic web environments and autonomously acquire external information. We propose BrowserAgent, a more interactive agent that mimics human web browsing behavior and solves complex tasks through various browser actions, such as scrolling, clicking, and typing. BrowserAgent operates directly on raw web pages via Playwright and employs a two-step training approach: SFT and RFT. It achieves competitive results on various Open-QA tasks while using less training data than Search-R1. Furthermore, we introduce an explicit memory mechanism to enhance the model's inference ability for long-term tasks. BrowserAgent-7B outperforms Search-R1 by approximately 20% on multi-hop QA tasks such as HotpotQA, 2Wiki, and Bamboogle.
Takeaways, Limitations
•
Takeaways:
◦
Improve interactivity with the web environment by designing interactive agents that mimic human browsing behavior.
◦
Improves the generalization ability of the model through a two-step training method of SFT and RFT.
◦
Enhanced reasoning ability for long-term tasks through explicit memory mechanisms.
◦
Achieves superior performance using less training data than Search-R1.
◦
Shows significant performance improvements in multi-hop QA tasks.
•
Limitations:
◦
The specific Limitations is not directly mentioned in the paper. (This needs to be determined through further discussion.)