This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
Created by
Haebom
Author
Ran Xu, Yuchen Zhuang, Yishan Zhong, Yue Yu, Zifeng Wang, Xiangru Tang, Hang Wu, May D. Wang, Peifeng Ruan, Donghan Yang, Tao Wang, Guanghua Xiao, Xin Liu, Carl Yang, Yang Xie, Wenqi Shi
Outline
MedAgentGym is a scalable and interactive training environment designed to enhance the coding-based biomedical reasoning capabilities of LLM agents. It consists of 72,413 task instances across 129 categories derived from 12 real-world biomedical scenarios. Each task is encapsulated within an executable sandbox environment, featuring detailed task specifications, interactive feedback mechanisms, verifiable answer annotations, and scalable training trajectory generation. Extensive benchmarking on 29 LLMs revealed a significant performance gap between commercial and open-source LLMs in biomedical data science. Leveraging efficient multi-threading and multi-turn trajectory sampling in MedAgentGym, Med-Copilot achieved +43.02% and +45.28% performance gains from offline and online reinforcement learning, respectively, demonstrating MedAgentGym as an effective training platform. Furthermore, MedAgentGym is positioned as a cost-effective and privacy-preserving alternative to proprietary LLMs (gpt-4o). MedAgentGym provides a unified platform for developing LLM-based coding assistants for advanced biomedical data science by providing a unified execution environment with comprehensive benchmarks and accessible, scalable training resources.
Takeaways, Limitations
•
Takeaways:
◦
Providing an effective training environment to improve the coding-based biomedical reasoning capabilities of LLM agents.
◦
Provides benchmarks showing the performance gap between commercial LLMs and open source LLMs.
◦
Demonstrating improved performance of Med-Copilot.
◦
Providing cost-effective and privacy-preserving LLM training alternatives.
◦
Providing an integrated platform for developing LLM-based coding assistants for advanced biomedical data science.