Sign In

Nano World Models: A Minimalist Implementation of Future Video Prediction

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Siqiao Huang, Partha Kaushik, Michael Chen, Hengkai Pan, Kaiwen Geng, Omar Chehab, Fernando Moreno-Pino, Max Simchowitz

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ฐจ์„ธ๋Œ€ ์˜ˆ์ธก ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ตฌ์ถ•์— ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋Š” ์›”๋“œ ๋ชจ๋ธ ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๊ฐ„๊ฒฐํ•˜๊ณ  ์žฌํ˜„ ๊ฐ€๋Šฅํ•˜๋ฉฐ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๊ตฌํ˜„์ฒด์ธ "Nano World Models"๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋ฒ ์ด์Šค๋Š” ํ™•์‚ฐ ๊ฐ•์ œ(diffusion forcing)๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๋ฏธ๋ž˜ ๋น„๋””์˜ค ์˜ˆ์ธก์„ ์œ„ํ•œ ํ†ตํ•ฉ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•œ ํ†ต์ œ๋œ ์—ฐ๊ตฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ์—์„œ ์‹คํ—˜์„ ํ†ตํ•ด ์˜ˆ์ธก ํ’ˆ์งˆ๊ณผ ๋กค์•„์›ƒ ๋™์ž‘์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์š”์ธ๋“ค์„ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
ํ˜„๋Œ€ ์›”๋“œ ๋ชจ๋ธ ์—ฐ๊ตฌ์—์„œ ํ•ต์‹ฌ์ ์ธ ๊ตฌ์„ฑ ์š”์†Œ๋“ค์˜ ์˜ํ–ฅ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ฝ”๋“œ, ์„ค์ •, ํ‰๊ฐ€ ์Šคํฌ๋ฆฝํŠธ, ์‚ฌ์ „ ํ•™์Šต๋œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๊ณต๊ฐœํ•˜์—ฌ ์›”๋“œ ๋ชจ๋ธ ์—ฐ๊ตฌ์˜ ๊ฐœ๋ฐฉ์„ฑ, ์žฌํ˜„์„ฑ ๋ฐ ๊ณผํ•™์  ๋ฐœ์ „์„ ์ด‰์ง„ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ๊ตฌํ˜„์€ ํŠน์ • ์˜ˆ์ธก ๋ฐฉ๋ฒ•๋ก (ํ™•์‚ฐ ๊ฐ•์ œ)์— ์ง‘์ค‘๋˜์–ด ์žˆ์–ด, ๋‹ค๋ฅธ ์ƒ์„ฑ์  ๋ชฉํ‘œ๋‚˜ ์ƒˆ๋กœ์šด ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•œ ํ™•์žฅ์„ฑ์€ ํ–ฅํ›„ ๊ณผ์ œ๋กœ ๋‚จ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘