[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Benchmarking Mobile Device Control Agents across Diverse Configurations

Created by
  • Haebom

Author

Juyong Lee, Taywon Min, Minyong An, Dongyoon Hahm, Haeone Lee, Changyeon Kim, Kimin Lee

Outline

B-MoCA is a new benchmark for evaluating the performance of mobile device control agents. It is based on the Android operating system and contains 131 common tasks. It evaluates generalization performance by randomly changing the configuration of the mobile device, such as the user interface layout and language settings. It benchmarks a variety of agents, including agents using large-scale language models (LLMs) or multi-modal LLMs, and agents trained by imitation learning using expert demonstrations. It shows that agents are good at simple tasks but perform poorly on complex tasks, suggesting important areas for future research. The source code is publicly available.

Takeaways, Limitations

Takeaways: Provides a standardized benchmark for research on mobile device control agents, enables performance comparison and analysis of various agents, enables generalized performance evaluation of mobile device control agents, and suggests future research directions (improving complex task performance capabilities).
Limitations: Limitations on the complexity of tasks included in the current benchmark, differences from real-world usage environments, dependence on specific Android versions and devices, and potential inability to adequately reflect the various types of mobile devices and user behaviors.
👍