This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
To address the reliability issues of mobile GUI agents based on Large Foundation Models (LFMs), this paper proposes VeriSafe Agent (VSA), a formal verification system. VSA automatically formalizes natural language user commands into formally verifiable specifications, enabling real-time verification that agent behaviors accurately reflect user intent. Implemented using GPT-4o, VSA achieved an accuracy of 94.33%-98.33% on 300 user commands from 18 mobile apps, an improvement of 30.00%-16.33% over existing methods, and a 90%-130% improvement in task completion rates for GUI agents. This represents the first attempt to bridge the gap between LFM-based behavior and formal software verification.
Takeaways, Limitations
•
Takeaways:
◦
We present a novel formal verification system that can significantly improve the reliability and security of LFM-based mobile GUI agents.
◦
We demonstrate that automatic formalization techniques can effectively transform natural language commands into formal specifications.
◦
The superior performance of VSA is verified through experimental results in real mobile apps.
◦
Presenting a new paradigm for ensuring the safety of LFM-based systems.
•
Limitations:
◦
Currently relying on GPT-4o, generalizability to other LFM models requires further research.
◦
With 18 apps and 300 user commands, it may not be enough to cover all types of mobile tasks.
◦
Further verification of the accuracy and completeness of the automatic formatting process is needed.
◦
Further research is needed on processing performance for complex situations or ambiguous natural language commands.