Sign In

MCLR: Improving Conditional Modeling via Inter-Class Likelihood-Ratio Maximization and Unifying Classifier-Free Guidance with Alignment Objectives

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Xiang Li, Yixuan Jia, Xiao Li, Jeffrey A. Fessler, Rongrong Wang, Qing Qu

๐Ÿ’ก ๊ฐœ์š”

์ด ๋…ผ๋ฌธ์€ ํ™•์‚ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๊ธฐ์กด์˜ ํ‘œ์ค€ denoising score matching(DSM)์˜ ํ•œ๊ณ„์ ์œผ๋กœ ์ง€์ ๋˜๋Š” ํด๋ž˜์Šค ๊ฐ„ ๋ถ„๋ฆฌ ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ํด๋ž˜์Šค ๊ฐ„ ์šฐ๋„๋น„(likelihood-ratio)๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ƒˆ๋กœ์šด ํ›ˆ๋ จ ๋ชฉํ‘œ์ธ MCLR(Maximizing Inter-Class Likelihood-Ratios)์„ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ถ”๋ก  ์‹œ classifier-free guidance(CFG) ์—†์ด๋„ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€, MCLR์ด CFG์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜๊ณผ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Œ์„ ์ฆ๋ช…ํ•˜๋ฉฐ CFG๋ฅผ ์ •๋ ฌ ๊ธฐ๋ฐ˜ ๋ชฉํ‘œ๋ฅผ ํ†ตํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ธฐ์กด ํ™•์‚ฐ ๋ชจ๋ธ ํ›ˆ๋ จ ๋ฐฉ์‹์˜ ๊ทผ๋ณธ์ ์ธ ๋ฌธ์ œ์ ์ธ ํด๋ž˜์Šค ๊ฐ„ ๋ถ„๋ฆฌ ๋ถ€์กฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐœ์„ ํ•˜์—ฌ, ๋ณ„๋„์˜ ์ถ”๋ก  ์‹œ๊ฐ„ ๊ธฐ๋ฒ• ์—†์ด๋„ ์ƒ์„ฑ ํ’ˆ์งˆ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆํ•˜๋Š” MCLR ํ›ˆ๋ จ ๋ชฉํ‘œ๊ฐ€ ์ถ”๋ก  ์‹œ๊ฐ„ CFG์˜ ํšจ๊ณผ๋ฅผ ๋‚ด์žฌํ™”ํ•จ์œผ๋กœ์จ, ํ›ˆ๋ จ ๋‹จ๊ณ„์—์„œ๋ถ€ํ„ฐ ๋ชจ๋ธ์˜ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
CFG์˜ ์ด๋ก ์  ํ•ด์„์„ ์ œ๊ณตํ•˜๊ณ , ์ •๋ ฌ ๊ธฐ๋ฐ˜ ๋ชฉํ‘œ์™€์˜ ์—ฐ๊ฒฐ์„ฑ์„ ๊ทœ๋ช…ํ•˜์—ฌ ํ™•์‚ฐ ๋ชจ๋ธ ํ›ˆ๋ จ ๋ฐ ์œ ๋„์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ดํ•ด๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ MCLR์ด ๋ชจ๋“  ์ข…๋ฅ˜์˜ ํ™•์‚ฐ ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด CFG์™€ ๋™์ผํ•˜๊ฑฐ๋‚˜ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ๊ฒ€์ฆ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘