Sign In

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Daojie Peng, Fulong Ma, Jiahang Cao, Qiang Zhang, Xupeng Xie, Jian Guo, Ping Luo, Andrew F. Luo, Boyu Zhou, Jun Ma

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด ๋กœ๋ด‡ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ์ด ์‹œ๊ฐ„์  ๋™์งˆ์„ฑ์„ ๊ฐ€์ •ํ•˜์—ฌ ๋ชจ๋“  ํ–‰๋™์„ ๋™๋“ฑํ•˜๊ฒŒ ํ•™์Šตํ•˜๋Š” ๋ฌธ์ œ์ ์„ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์†๋„ ๊ธฐ๋ฐ˜์˜ ํ–‰๋™ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ ์†๋„๊ฐ€ ๋А๋ฆฐ, ์ฆ‰ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ค‘์š”ํ•œ ๊ตฌ๊ฐ„์— ๋” ๋งŽ์€ ํ•™์Šต ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” AttenA+ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. AttenA+๋Š” ๊ธฐ์กด ๋ชจ๋ธ์— ์ถ”๊ฐ€์ ์ธ ๊ตฌ์กฐ๋‚˜ ๋งค๊ฐœ๋ณ€์ˆ˜ ์—†์ด ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๋ณต์žกํ•œ ์žฅ๊ธฐ ๋กœ๋ด‡ ์ž‘์—…์—์„œ์˜ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
ํ–‰๋™ ์‹œํ€€์Šค์˜ ๋ฌผ๋ฆฌ์  ์ค‘์š”๋„ ๊ณ ๋ ค: ๋กœ๋ด‡ ์ž‘์—… ์„ฑ๊ณต์— ๊ฒฐ์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋‚ฎ์€ ์†๋„์˜ ์›€์ง์ž„์— ๋Œ€ํ•œ ์ง‘์ค‘ ํ•™์Šต์„ ํ†ตํ•ด ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ: AttenA+๋ฅผ ๊ธฐ์กด Vision-Language-Action (VLA) ๋ฐ World-Action Models (WAM) ๋ชจ๋ธ์— ์ ์šฉํ•จ์œผ๋กœ์จ ๋ณ„๋„์˜ ๊ตฌ์กฐ ๋ณ€๊ฒฝ ์—†์ด๋„ ์ตœ์ฒจ๋‹จ ์„ฑ๋Šฅ์„ ๋”์šฑ ๋Œ์–ด์˜ฌ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ƒˆ๋กœ์šด ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ ๊ตฌ์ถ• ๋ฐฉํ–ฅ ์ œ์‹œ: ํ–‰๋™ ์‹œํ€€์Šค์˜ ๋‚ด์žฌ๋œ ๋ฌผ๋ฆฌ์  ๊ตฌ์กฐ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋กœ๋ด‡ ์ œ์–ด ๋ชจ๋ธ์˜ ํšจ์œจ์ ์ธ ๋ฐœ์ „์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ฒฝ๋กœ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘