Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

OpenCUA: Open Foundations for Computer-Use Agents

Created by
  • Haebom

Author

Xinyuan Wang, Bowen Wang, Dunjie Lu, Junlin Yang, Tianbao Jixuan Chen, Yuxiao Ye, Danyang Zhang, Dikang Du, Hao Hu, Huarong Chen, Zaida Zhou, Haotian Yao, Ziwei Chen, Qizheng Gu, Yipu Wang, Heng Wang, Diyi Yang, Victor Zhong, Flood Sung, Y. Charles, Zhilin Yang, Tao Yu

Outline

This paper proposes OpenCUA, an open-source framework for enhancing the potential and accessibility of computer-assisted agents (CUAs). OpenCUA consists of an annotation infrastructure that captures human computer-assisted demonstrations; AgentNet, a large-scale computer-assisted task dataset spanning three operating systems and over 200 applications and websites; and a scalable pipeline that converts these demonstrations into state-action pairs. The OpenCUA-32B model achieved a 34.8% success rate on the OSWorld-Verified benchmark, achieving the highest performance among open-source models and outperforming OpenAI CUA (GPT-4o). This study lays the foundation for CUA research by releasing the annotation tools, datasets, code, and models.

Takeaways, Limitations

Takeaways:
Advancing and improving accessibility of CUA research by providing an open-source CUA framework.
Release of AgentNet, a large-scale computer-based task dataset.
Overcoming the performance limitations of open source models with the OpenCUA-32B model.
Verification of generalization performance across various domains and performance improvement due to increased test time calculations.
Limitations:
Current performance is still not perfect (34.8% success rate), and further research is needed to achieve higher performance.
Further improvements may be needed to increase the scope and diversity of the AgentNet dataset.
Possible bias towards specific operating systems and applications.
👍