This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper presents a novel approach to address the design of autonomous agents for graphical user interfaces (GUIs) in specialized fields such as scientific computing. This approach overcomes the limitations of existing general and expert agents in situations requiring both long-term planning and precise execution. While existing approaches face a tradeoff between planning and execution capabilities, we present CODA, a learnable, compositional framework that integrates a general planner (Cerebrum) and an expert executor (Cerebellum). CODA is trained through a two-stage pipeline. In the first stage, Specialization, expert planners are trained individually for each scientific application. In the second stage, Generalization, all successful trajectories are aggregated and used for supervised fine-tuning of the final planner. This ensures that CODA possesses both robust execution and cross-domain generalization capabilities. On four tasks of the ScienceBoard benchmark, CODA significantly outperforms existing methods and achieves the highest performance among open-source models.
Takeaways, Limitations
•
Takeaways:
◦
A novel approach to improving the performance of GUI autonomous agents in scientific computing.
◦
Overcoming existing limitations by combining general planning skills with professional execution skills
◦
Adaptability from experience through a learnable, configurable framework
◦
Achieve effective performance even in limited data environments
◦
Highest performance among open source models
•
Limitations:
◦
Further evaluation of the generalizability of the proposed framework is needed.
◦
Scalability verification is required for various scientific fields and more complex GUI environments.
◦
Performance evaluation on benchmarks other than the ScienceBoard benchmark is required.
◦
Need to assess the dependence on the quality of training data