Sign In

NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks

Created by
  • Haebom
Category
Empty

์ €์ž

Zihan Zheng, Tianle Cui, Taoran Wang, Fengtao Wang, Jiahui Pan, Lewei He, Qianglong Chen

๐Ÿ’ก ๊ฐœ์š”

LLM ๊ธฐ๋ฐ˜ GUI ์—์ด์ „ํŠธ ๋ถ„์•ผ๋Š” ํ˜„์‹ค์ ์ธ ํ™˜๊ฒฝ ๊ตฌํ˜„๊ณผ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ํ‰๊ฐ€ ์ •ํ™•๋„ ํ™•๋ณด๋ผ๋Š” ๊ณผ์ œ๋ฅผ ์•ˆ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ ์‹ค์ œ ์ธ๊ฐ„์˜ GUI ์ƒํ˜ธ์ž‘์šฉ ์˜๋„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์ธ NaturalGAIA๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. NaturalGAIA๋Š” ์ธ์ง€์  ๋น„์„ ํ˜•์„ฑ๊ณผ ๋งฅ๋ฝ ์˜์กด์„ฑ์„ ํŠน์ง•์œผ๋กœ ํ•˜๋Š” ์ž์—ฐ์Šค๋Ÿฌ์šด ์ธ๊ฐ„์˜ ์˜๋„๋ฅผ ์—„๊ฒฉํ•˜๊ฒŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋ฉฐ, LightManus-Jarvis๋ผ๋Š” ๊ณ„์ธต์  ํ˜‘์—… ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ์ž์—ฐํ™”๋œ ์ž‘์—…์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
ํ˜„์‹ค์ ์ธ ์ธ๊ฐ„์˜ GUI ์ƒํ˜ธ์ž‘์šฉ ์˜๋„๋ฅผ ๋ฐ˜์˜ํ•œ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•์˜ ์ค‘์š”์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋™์ ์ธ ๊ณ„ํš ์ˆ˜๋ฆฝ๊ณผ ๋งฅ๋ฝ ์ง„ํ™”๋ฅผ ๋‹ด๋‹นํ•˜๋Š” LightManus์™€ ์‹คํ–‰ ์ •ํ™•๋„๋ฅผ ๋ณด์žฅํ•˜๋Š” Jarvis์˜ ํ˜‘์—…์„ ํ†ตํ•ด ๋ณต์žกํ•œ GUI ํƒœ์Šคํฌ ์ˆ˜ํ–‰ ๋Šฅ๋ ฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ชจ๋ธ์€ ๊ธฐ์กด ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ ๋Œ€๋น„ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๋›ฐ์–ด๋„˜์œผ๋ฉด์„œ๋„ ํ† ํฐ ์†Œ๋น„์™€ ์‹คํ–‰ ์‹œ๊ฐ„์„ ํš๊ธฐ์ ์œผ๋กœ ์ค„์—ฌ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ณด๋‹ค ๊ด‘๋ฒ”์œ„ํ•œ GUI ์ž‘์—… ์œ ํ˜•๊ณผ ๋ณต์žก์„ฑ์„ ํฌ๊ด„ํ•˜๊ณ , ์‹ค์ œ ์‚ฌ์šฉ์ž ํ™˜๊ฒฝ์—์„œ์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋”์šฑ ์‹ฌ๋„ ์žˆ๊ฒŒ ํƒ๊ตฌํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘