Sign In

BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Created by
  • Haebom
Category
Empty

์ €์ž

Huanyao Zhang, Jiepeng Zhou, Bo Li, Bowen Zhou, Yanzhe Dan, Haishan Lu, Zhiyong Cao, Jiaoyang Chen, Yuqian Han, Zinan Sheng, Zhengwei Tao, Hao Liang, Jialong Wu, Yang Shi, Yuanpeng He, Jiaye Lin, Qintong Zhang, Guochen Yan, Runhao Zhao, Zhengpin Li, Xiaohan Yu, Lang Mei, Chong Chen, Wentao Zhang, Bin Cui

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์›น ๋ธŒ๋ผ์šฐ์ง• ์—์ด์ „ํŠธ์˜ ๊นŠ์ด ์žˆ๋Š” ๊ฒ€์ƒ‰ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ BrowseComp-$V^3$๋ฅผ ์ œ์•ˆํ•œ๋‹ค. BrowseComp-$V^3$๋Š” 300๊ฐœ์˜ ๋ณต์žกํ•œ ์งˆ๋ฌธ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ํ…์ŠคํŠธ์™€ ์‹œ๊ฐ ์ •๋ณด๊ฐ€ ํ˜ผํ•ฉ๋œ ์›น ํŽ˜์ด์ง€์—์„œ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์น˜๋Š” ์ถ”๋ก ์„ ์š”๊ตฌํ•˜๊ณ  ๋ชจ๋“  ๊ทผ๊ฑฐ ์ž๋ฃŒ๋Š” ๊ณต๊ฐœ์ ์œผ๋กœ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ์ตœ์ข… ๋‹ต๋ณ€ ์ •ํ™•๋„ ์™ธ์—๋„ ์ค‘๊ฐ„ ์ถ”๋ก  ๊ณผ์ •์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๋„์ž…ํ•˜์—ฌ ๋ชจ๋ธ์˜ ๋Šฅ๋ ฅ ๊ฒฝ๊ณ„๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์›น ๋ธŒ๋ผ์šฐ์ง• ์—์ด์ „ํŠธ์˜ ๊นŠ์ด ์žˆ๋Š” ๊ฒ€์ƒ‰ ๋Šฅ๋ ฅ์„ ์ •๋ฐ€ํ•˜๊ฒŒ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ํ‘œ์ค€์„ ์ œ์‹œํ•œ๋‹ค.
โ€ข
์ค‘๊ฐ„ ์ถ”๋ก  ๊ณผ์ • ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์•ฝ์ ์„ ์‹๋ณ„ํ•˜๊ณ  ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ ์„ค์ •์— ๊ธฐ์—ฌํ•œ๋‹ค.
โ€ข
ํ˜„์žฌ ์ตœ์‹  ๋ชจ๋ธ๋“ค๋„ BrowseComp-$V^3$์—์„œ 36%์˜ ์ •ํ™•๋„๋ฅผ ๋ณด์ด๋Š” ๋“ฑ, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ๊ฒฌ๊ณ ํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์‹ฌ์ธต ๊ฒ€์ƒ‰ ๋Šฅ๋ ฅ๊ณผ ํ˜„์กด ๋ชจ๋ธ ๊ฐ„์˜ ํฐ ๊ฒฉ์ฐจ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฒค์น˜๋งˆํฌ๋Š” ๋ณต์žก์„ฑ์ด ๋†’๊ณ  ํ‰๊ฐ€ ํ•ญ๋ชฉ์ด ์„ธ๋ถ„ํ™”๋˜์–ด ์žˆ์–ด, ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ ๋ชจ๋ธ์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ •๋ณด ํ†ตํ•ฉ ๋ฐ ๋ฏธ์„ธ ์ธ์‹ ๋Šฅ๋ ฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถฐ์•ผ ํ•  ๊ฒƒ์ด๋‹ค.
๐Ÿ‘