Sign In

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Created by
  • Haebom
Category
Empty

์ €์ž

Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๊ณผํ•™ ์—ฐ๊ตฌ ์›Œํฌํ”Œ๋กœ์šฐ์—์„œ ์ž์œจ์ ์ด๊ณ  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌํ•œ AI ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํ™˜๊ฒฝ์ธ ScienceBoard๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ScienceBoard๋Š” ๋‹ค์–‘ํ•œ ๊ณผํ•™ ๋ถ„์•ผ์˜ ํ˜„์‹ค์ ์ธ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ณ , 169๊ฐœ์˜ ์—„์„ ๋œ ์‹ค์ œ ๊ณผํ•™ ๊ณผ์ œ๋กœ ๊ตฌ์„ฑ๋œ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์ตœ์‹  ์—์ด์ „ํŠธ๋“ค์˜ ํ‰๊ฐ€ ๊ฒฐ๊ณผ, ๋ณต์žกํ•œ ๊ณผํ•™ ์—…๋ฌด๋ฅผ reliably ์ง€์›ํ•˜๋Š” ๋ฐ์—๋Š” ์•„์ง ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
ํ˜„์‹ค์ ์ด๊ณ  ๋™์ ์ธ ๊ณผํ•™ ์—ฐ๊ตฌ ํ™˜๊ฒฝ ๋ฐ ๋ฒค์น˜๋งˆํฌ ์ œ๊ณต์„ ํ†ตํ•ด AI ์—์ด์ „ํŠธ์˜ ๊ณผํ•™์  ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ ํ‰๊ฐ€ ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์—ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ตœ์‹  LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ๋“ค์ด ๋ณต์žกํ•œ ๊ณผํ•™ ์›Œํฌํ”Œ๋กœ์šฐ์—์„œ ์—ฌ์ „ํžˆ ๋‚ฎ์€ ์„ฑ๊ณต๋ฅ (15%)์„ ๋ณด์ด๋ฉฐ, ์ธ๊ฐ„ ์—ฐ๊ตฌ์ž์˜ ์ง€์› ์—†์ด๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ์„ ๋ช…ํ™•ํžˆ ํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ์—์ด์ „ํŠธ์˜ ํ•œ๊ณ„์ ์„ ๋ถ„์„ํ•˜๊ณ , ํ–ฅํ›„ ๋” ์œ ๋Šฅํ•œ ๊ณผํ•™ ๋ฐœ๊ฒฌ ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ์„ค๊ณ„ ์›์น™์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฒค์น˜๋งˆํฌ๋Š” ํ˜„์‹ค์ ์ธ ๊ณผํ•™์  ๊ณผ์ œ๋ฅผ ๋‹ค๋ฃจ์ง€๋งŒ, ์•„์ง ์—์ด์ „ํŠธ๊ฐ€ ๊ทน๋ณตํ•ด์•ผ ํ•  ๋ณต์žก์„ฑ๊ณผ ๋‹ค์–‘์„ฑ์ด ์กด์žฌํ•˜๋ฉฐ, ๋” ๊ด‘๋ฒ”์œ„ํ•œ ๊ณผํ•™ ๋ถ„์•ผ ๋ฐ ๊ณผ์ œ ์œ ํ˜•์„ ํฌํ•จํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘