Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Created by
  • Haebom

Author

Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, Jiaqi Wang

Outline

This paper proposes SEAgent, a framework for computer-enabled agents (CUAs) that learn and evolve autonomously in new software environments without human intervention. Based on large-scale vision-language models (LVLMs), SEAgent learns new software through trial-and-error experiential learning. It learns by performing automatically generated tasks that progress from simple to complex, utilizing a World State Model for detailed step-by-step path evaluation and a Curriculum Generator to generate increasingly diverse and challenging tasks. The agent's policy is updated through adversarial imitation for failed actions and Group Relative Policy Optimization (GRPO) for successful actions. Furthermore, we develop a robust generalizing CUA capable of continuous autonomous evolution through an expert-generalization strategy that integrates the empirical insights of specialized agents. We validate the effectiveness of SEAgent on five new software environments within OS-World, improving the success rate by 23.2% (from 11.3% to 34.5%) compared to UI-TARS, an existing open-source CUA.

Takeaways, Limitations

Takeaways:
Demonstrates the potential of CUA to learn and adapt to new software without human intervention.
Provides effective learning strategies through experiential learning and Curriculum Generator.
Achieving general CUA performance improvements through expert-generalization strategies.
Achieved significant performance improvements compared to existing CUA.
Limitations:
Since it was only validated in a specific environment called OS-World, further research is needed to determine its generalizability.
Additional performance validation in real-world, complex software environments is required.
It is possible that the design of the World State Model and Curriculum Generator is optimized for a specific environment.
Further analysis is needed to address the potential unpredictability that may arise during the agent's learning process.
👍