This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Created by
Haebom
Author
Zhengxi Lu, Jiabo Ye, Fei Tang, Yongliang Shen, Haiyang Xu, Ziwei Zheng, Weiming Lu, Ming Yan, Fei Huang, Jun Xiao, Yueting Zhuang
Outline
This paper studies a GUI agent that automates complex user interface interactions through reinforcement learning. While conventional offline reinforcement learning enables stable training, it lacks the reward signals necessary for multi-step task execution. Online reinforcement learning captures these signals but suffers from sparse rewards and high deployment costs. To address this, this paper presents Semi-online Reinforcement Learning, a novel paradigm that simulates online reinforcement learning on offline paths. During each rollout, the original model output within a multi-turn conversation is preserved, and a patch module is used to adaptively recover the differences between the rollout and expert paths. To capture long-term training signals, discounted future revenue is introduced into reward calculations, and policy optimization is achieved using weighted step-level and episode-level advantages. Furthermore, Semi-Online Performance (SOP), a metric that better matches actual online performance, is introduced as a practical and effective surrogate metric for evaluating real-world environments. Experimental results show that the proposed semi-online RL achieves the best performance among 7B models on four dynamic benchmarks, demonstrating significant performance improvements over the baseline model (e.g., +12.0% on AndroidWorld and +23.8% on AITW). This represents a significant advance in narrowing the gap between offline training efficiency and online multi-turn inference. The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1 .
A proposal for semi-online reinforcement learning that combines the stability of offline reinforcement learning with the multi-step task execution capability of online reinforcement learning.
◦
Capturing long-term training signals through adaptive recovery and discounted future returns, along with rollouts and expert paths via Patch Modules.
◦
Semi-Online Performance (SOP) indicators are presented, which closely match actual online performance.
◦
Demonstrating practical utility through improved performance compared to existing models in various benchmarks.
•
Limitations:
◦
Further verification of the generalization performance of the proposed method is needed.
◦
Performance evaluation and comparative analysis of models of various sizes are needed.
◦
Further research is needed to determine the precise correlation between SOP metrics and actual online performance.
◦
Analysis of the complexity and computational cost of the Patch Module is needed.