Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

GUI Agents: A Survey

Created by
  • Haebom

Author

Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Lina Yao, Branislav Kveton, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A. Rossi, Franck Dernoncourt

Outline

This paper presents a comprehensive survey of large-scale, fundamental model-based graphical user interface (GUI) agents. GUI agents are automated systems that interact with digital systems or software applications across various platforms by mimicking human behaviors such as clicking, typing, and navigating. This paper categorizes benchmarks, evaluation metrics, architectures, and learning methods for GUI agents and proposes a unified framework that describes their perception, reasoning, planning, and action capabilities. It also identifies important open challenges and future directions, helping researchers and practitioners understand current progress, technologies, benchmarks, and open challenges.

Takeaways, Limitations

Takeaways:
Providing comprehensive research and analysis in the GUI agent field.
Presenting an integrated framework for GUI agents (perception, reasoning, planning, and action).
Provides a clear understanding of current technology levels, benchmarks, and outstanding challenges.
Suggesting future research directions
Limitations:
This paper does not propose or experimentally verify a specific GUI agent system. Instead, it focuses on synthesizing and analyzing existing research.
Lack of discussion on the ethical and social implications of GUI agents.
As this is a rapidly developing field, it is difficult to reflect new research results after publication.
👍