Sign In

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Tongbo Chen, Zhengxi Lu, Zhan Xu, Guocheng Shao, Shaohan Zhao, Fei Tang, Yong Du, Kaitao Song, Yizhou Liu, Yuchen Yan, Wenqi Zhang, Xu Tan, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์‚ฌ์šฉ์ž ์„ ํ˜ธ๋„๋ฅผ ์ถ”๋ก ํ•˜๊ณ  ๋Šฅ๋™์ ์œผ๋กœ ์ง€์›ํ•˜๋Š” ๊ฐœ์ธํ™”๋œ ๋ชจ๋ฐ”์ผ ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์˜จ๋ผ์ธ ๋ฒค์น˜๋งˆํฌ์ธ KnowU-Bench๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์™€ ๋‹ฌ๋ฆฌ, KnowU-Bench๋Š” ์—์ด์ „ํŠธ๊ฐ€ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ๋ˆ„๋ฝ๋œ ์„ ํ˜ธ๋„๋ฅผ ๋Šฅ๋™์ ์œผ๋กœ ํŒŒ์•…ํ•˜๊ณ , ์–ธ์ œ ๊ฐœ์ž…ํ•ด์•ผ ํ•˜๋Š”์ง€, ๋™์˜๋ฅผ ๊ตฌํ•ด์•ผ ํ•˜๋Š”์ง€, ํ˜น์€ ์นจ๋ฌต์„ ์ง€์ผœ์•ผ ํ•˜๋Š”์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด LLM ๊ธฐ๋ฐ˜ ์‚ฌ์šฉ์ž ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ ์•ˆ๋“œ๋กœ์ด๋“œ ์—๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ํ™œ์šฉํ•˜์—ฌ 42๊ฐœ์˜ ์ผ๋ฐ˜ GUI ์ž‘์—…, 86๊ฐœ์˜ ๊ฐœ์ธํ™” ์ž‘์—…, 64๊ฐœ์˜ ๋Šฅ๋™์  ์ž‘์—…์— ๋Œ€ํ•œ ์ข…ํ•ฉ์ ์ธ ํ‰๊ฐ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
KnowU-Bench๋Š” ๋ชจ๋ฐ”์ผ ์—์ด์ „ํŠธ๊ฐ€ ์‚ฌ์šฉ์ž ์„ ํ˜ธ๋„๋ฅผ ๋‹จ์ˆœํžˆ ์ฐพ์•„๋ณด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋Œ€ํ™”ํ˜•์œผ๋กœ ์ถ”๋ก ํ•˜๊ณ  ๋Šฅ๋™์ ์œผ๋กœ ๊ฐœ์ž…ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ๊ธฐ์ค€์ ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ตœ์‹  LLM ๋ชจ๋ธ์กฐ์ฐจ๋„ ๋ช…ํ™•ํ•œ ์ง€์‹œ ์™ธ์— ์‚ฌ์šฉ์ž ์„ ํ˜ธ๋„ ์ถ”๋ก ์ด๋‚˜ ๊ฐœ์ž… ์‹œ์  ๊ฒฐ์ •์—์„œ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๋ณด์ด๋Š” ๊ฒƒ์„ ์‹คํ—˜ ๊ฒฐ๊ณผ๊ฐ€ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ด๋Š” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐœ์ธ ๋น„์„œ๋กœ์„œ์˜ ์—ญํ•  ์ˆ˜ํ–‰์— ๊ทผ๋ณธ์ ์ธ ๊ฐ„๊ทน์ด ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ–ฅํ›„ ์—ฐ๊ตฌ๋Š” GUI ํƒ์ƒ‰ ๋Šฅ๋ ฅ๋ณด๋‹ค๋Š” ์„ ํ˜ธ๋„ ์Šต๋“ ๋ฐ ๊ฐœ์ž… ๋ณด์ •์ด๋ผ๋Š” ํ•ต์‹ฌ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•ด์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๐Ÿ‘