Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

How to Train Your LLM Web Agent: A Statistical Diagnosis

Created by
  • Haebom

Author

Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Mu noz-M armol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Pich e, Alexandre Lacoste, Massimo Caccia

Outline

This paper presents an efficient resource allocation strategy for open-source development of LLM-based web agents. To overcome the limitations of previous studies that focus on a single-step task and require high computational costs, we propose a pipeline that performs supervised learning (SFT) and on-policy reinforcement learning in two steps to make the Llama 3.1 8B model mimic the Llama 3.3 70B model. By sampling 1,370 hyperparameter combinations and estimating effective hyperparameters through bootstrapping, we verify that the combined SFT and RL method outperforms SFT alone or RL alone on WorkArena and MiniWob++, and achieves equivalent performance with 55% of the computational resources compared to pure SFT on MiniWob++. This is also the only strategy that closes the performance gap with closed models.

Takeaways, Limitations

Takeaways:
Presenting an efficient computing resource allocation strategy for open source development of LLM-based web agents
Improving performance and reducing computing costs by combining supervised learning (SFT) and on-policy reinforcement learning.
Suggests the possibility of bridging the performance gap with closed models
Presentation of an efficient hyperparameter search method using bootstrapping technique
Limitations:
Additional validation of the generalization performance of the proposed method is needed.
Dependency on specific LLM models (Llama 3.1, 3.3)
Need for performance evaluation in more diverse and complex web environments
Uncertainty in reaching the optimal point due to the limited number of samples (1,370) used in hyperparameter exploration
👍