Sign In

Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Siqi Luo, Jianghan Shen, Yi Xin, Huayu Zheng, Haoxing Chen, Yan Tai, Yue Li, Junjun He, Yihao Liu, Guangtao Zhai, Yuewen Cao, Xiaohong Liu

๐Ÿ’ก ๊ฐœ์š”

์ด ๋…ผ๋ฌธ์€ ํ™•์‚ฐ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(dMLLMs)์˜ ๊ฐ•ํ™” ํ•™์Šต ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•์ธ ๊ณ„์ธต์  ํ† ํฐ GRPO(HT-GRPO)๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. HT-GRPO๋Š” ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ณผ์ •์˜ ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ๊ธ€๋กœ๋ฒŒ ๋ ˆ์ด์•„์›ƒ์„ ๊ฒฐ์ •ํ•˜๋Š” ์ดˆ๊ธฐ ํ† ํฐ๊ณผ ๋กœ์ปฌ ๋””ํ…Œ์ผ์„ ๋‹ด๋‹นํ•˜๋Š” ํ›„๊ธฐ ํ† ํฐ์— ์ฐจ๋“ฑ์ ์ธ ๋ณด์ƒ์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ "Sketch-Then-Paint" ํ›ˆ๋ จ ๋ฐฉ์•ˆ์€ ๊ธ€๋กœ๋ฒŒ, ๊ตฌ์กฐ, ์ •์ œ ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ์ •์ฑ… ์ตœ์ ํ™”๋ฅผ ์ฒด๊ณ„ํ™”ํ•˜๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ GenEval ๋ฐ DPG ๋ฒค์น˜๋งˆํฌ์—์„œ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
dMLLMs์˜ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ณผ์ •์— ๋‚ด์žฌ๋œ ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ๊ฐ•ํ™” ํ•™์Šต ์ •์ฑ… ์ตœ์ ํ™”์— ํšจ๊ณผ์ ์œผ๋กœ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
"Sketch-Then-Paint" ํ›ˆ๋ จ ๋ฐฉ์•ˆ๊ณผ ๊ณ„์ธต์  ์‹ ์šฉ ํ• ๋‹น ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์ด๋ฏธ์ง€ ํ’ˆ์งˆ, ์‹ฌ๋ฏธ์„ฑ, ์‚ฌ์šฉ์ž ์„ ํ˜ธ๋„ ์ „๋ฐ˜์— ๊ฑธ์ณ ์ƒ๋‹นํ•œ ๊ฐœ์„ ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
โ€ข
ํ”„๋กฌํ”„ํŠธ ์กฐ๊ฑด๋ถ€ ์ถ”์ •๊ธฐ๋ฅผ ํ†ตํ•œ ์ค‘์š”๋„ ๋น„์œจ ๊ณ„์‚ฐ์ด ๋ชจ๋“  ํ† ํฐ์— ๋Œ€ํ•ด ๊ท ์ผํ•œ ๋ณด์ƒ์„ ํ• ๋‹นํ•˜๋Š” ๊ธฐ์กด ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ ๋ฐ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ์ƒ์„ฑ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ ์šฉ์„ฑ์€ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘