Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

Created by
  • Haebom

Author

Gaole Dai, Shiqi Jiang, Ting Cao, Yuanchun Li, Yuqing Yang, Rui Tan, Mo Li, Lili Qiu

Outline

V-Droid is a mobile GUI task automation agent. Unlike traditional LLMs that directly generate actions for each step, V-Droid utilizes LLMs as a verifier to evaluate candidate actions. To achieve this, we present a comprehensive framework that includes a dedicated workflow for building and pre-populating a discretized action space, interactive progress preference learning, and a scalable human-agent joint annotation scheme. On multiple mobile task automation benchmarks, including AndroidWorld, AndroidLab, and MobileAgentBench, V-Droid achieves higher success rates (59.5%, 38.3%, and 49%, respectively) and significantly faster processing speed (4.3 seconds per step, 6.1x faster than existing agents). The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
A new mobile agent paradigm utilizing LLM as a verifier is presented.
Achieve higher task success rates and lower latency compared to existing agents.
Proposing a Scalable Collaborative Annotation Method for Efficient Data Collection
Increasing research reproducibility and development potential through open source disclosure.
Limitations:
Only performance evaluation results for specific benchmarks are presented, requiring further research on generalizability.
Lack of detailed analysis of the verifier's performance
Further validation is needed for applicability to various types of mobile tasks.
👍