Daily Arxiv

์ „ ์„ธ๊ณ„์—์„œ ๋ฐœ๊ฐ„๋˜๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ด€๋ จ ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•˜๋Š” ํŽ˜์ด์ง€ ์ž…๋‹ˆ๋‹ค.
๋ณธ ํŽ˜์ด์ง€๋Š” Google Gemini๋ฅผ ํ™œ์šฉํ•ด ์š”์•ฝ ์ •๋ฆฌํ•˜๋ฉฐ, ๋น„์˜๋ฆฌ๋กœ ์šด์˜ ๋ฉ๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ €์ž‘๊ถŒ์€ ์ €์ž ๋ฐ ํ•ด๋‹น ๊ธฐ๊ด€์— ์žˆ์œผ๋ฉฐ, ๊ณต์œ  ์‹œ ์ถœ์ฒ˜๋งŒ ๋ช…๊ธฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling

Created by
  • Haebom
Category
Empty

์ €์ž

Saurav Jha, M. Jehanzeb Mirza, Wei Lin, Shiqi Yang, Sarath Chandar

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ(VLM)์˜ ๊ณต๊ฐ„ ์ถ”๋ก  ๋Šฅ๋ ฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ํ…Œ์ŠคํŠธ ์‹œ๊ฐ„ ํ™•์žฅ์„ ํ™œ์šฉํ•˜๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, MindJourney์™€ ๊ฐ™์€ ์„ธ๊ณ„ ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ํ…Œ์ŠคํŠธ ์‹œ๊ฐ„ ๊ฒ€์ฆ๊ธฐ์˜ ๋™์ž‘์„ ์ฒด๊ณ„์ ์œผ๋กœ ์กฐ์‚ฌํ•˜๋ฉฐ, ๋ถˆํ™•์‹ค์„ฑ ๋ถ„์„์„ ํ†ตํ•ด ๊ฒ€์ฆ๊ธฐ์˜ ๋ณด์ƒ ์‹ ํ˜ธ๊ฐ€ ํŽธํ–ฅ๋˜์–ด ์žˆ๊ณ  ์‹ ๋ขฐ์„ฑ์ด ๋‚ฎ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐํ˜€๋ƒˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์œผ๋กœ, ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ํ”„๋ ˆ์ž„ ๊ธฐ๋ฐ˜์˜ ๋ฏธ์„ธํ•œ ์ฃผ์žฅ์— ๊ธฐ๋ฐ˜ํ•œ "๊ณต๊ฐ„์  ๋‹จ์–ธ์„ ํ†ตํ•œ ๊ฒ€์ฆ(ViSA)" ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ ๊ณต๊ฐ„ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
MindJourney์˜ ๊ฒ€์ฆ๊ธฐ๋Š” ๋ณด์ƒ ์‹ ํ˜ธ์˜ ์‹ ๋ขฐ์„ฑ์ด ๋‚ฎ๊ณ , ๋ฌด์ž‘์œ„ ์ ์ˆ˜ ๋งค๊ธฐ๊ธฐ์กฐ์ฐจ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ํ†ตํ•ด ๊ฒ€์ฆ๊ธฐ์˜ ํŽธํ–ฅ์„ฑ์„ ๋“œ๋Ÿฌ๋ƒˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ViSA ํ”„๋ ˆ์ž„์›Œํฌ๋Š” SAT-Real ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ณต๊ฐ„ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ํƒ์ƒ‰์  ํ–‰๋™์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
MMSI-Bench์™€ ๊ฐ™์€ ๊ณ ๋‚œ์ด๋„ ๋ฒค์น˜๋งˆํฌ์—์„œ๋Š” ํ˜„์žฌ ์„ธ๊ณ„ ๋ชจ๋ธ์ด ์ •๋ณด ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ๊ฒช์–ด, ์ƒ์ƒ๋œ ๋ทฐ๊ฐ€ ์„ธ๋ฐ€ํ•œ ์ถ”๋ก ์„ ํ’๋ถ€ํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐ ์‹คํŒจํ•˜๋Š” ํ•œ๊ณ„์ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
๐Ÿ‘