Sign In

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Hongyang Du, Junjie Ye, Xiaoyan Cong, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด ๋น„๋””์˜ค ํ™•์‚ฐ ๋ชจ๋ธ(VDMs)์ด 3D ๊ตฌ์กฐ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ VideoGPA๋Š” ๊ธฐํ•˜ํ•™์  ์ง€์‹ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ 3D ์ผ๊ด€์„ฑ์„ ์œ„ํ•œ ์ž๋™ํ™”๋œ ์„ ํ˜ธ๋„ ์‹ ํ˜ธ๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฅผ DPO(Direct Preference Optimization) ๊ธฐ๋ฒ•์œผ๋กœ VDM์„ ํ•™์Šต์‹œ์ผœ 3D ๊ตฌ์กฐ ์ผ๊ด€์„ฑ์„ ๊ฐ•ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ณ„๋„์˜ ์ธ๊ฐ„ ์ฃผ์„ ์—†์ด๋„ ๋น„๋””์˜ค์˜ ์‹œ๊ฐ„์  ์•ˆ์ •์„ฑ, ๊ธฐํ•˜ํ•™์  ํƒ€๋‹น์„ฑ, ์›€์ง์ž„ ์ผ๊ด€์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ธฐ์กด ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์˜ 3D ์ผ๊ด€์„ฑ ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ๊ธฐํ•˜ํ•™์  ์ง€์‹๊ณผ DPO๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ ์€ ์–‘์˜ ์„ ํ˜ธ๋„ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ๋„ ๋†’์€ ํ’ˆ์งˆ์˜ 3D ์ผ๊ด€์„ฑ์„ ๊ฐ–์ถ˜ ๋น„๋””์˜ค ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์ด ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ธ๊ฐ„ ์ฃผ์„ ์—†์ด๋„ ํ•™์Šต ๊ฐ€๋Šฅํ•˜์—ฌ ์‹ค์šฉ์ ์ž…๋‹ˆ๋‹ค.
โ€ข
๊ธฐํ•˜ํ•™์  ์ง€์‹ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด๋‚˜ ํŽธํ–ฅ์ด VideoGPA์˜ ๊ฒฐ๊ณผ์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ณต์žกํ•˜๊ณ  ๋น„์ •ํ˜•์ ์ธ ๊ฐ์ฒด๋‚˜ ์žฅ๋ฉด์—์„œ์˜ ์ผ๊ด€์„ฑ ์œ ์ง€์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘