Sign In

OneLatent: Single-Token Compression for Visual Latent Reasoning

Created by
  • Haebom
Category
Empty

์ €์ž

Bo Lv, Yasheng Sun, Junjie Wang, Haoxiang Shi

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ Chain-of-thought (CoT) ์ถ”๋ก  ์‹œ ๋ฐœ์ƒํ•˜๋Š” ๋ง‰๋Œ€ํ•œ ์ถ”๋ก  ๋น„์šฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ค‘๊ฐ„ ์ถ”๋ก  ๊ณผ์ •์„ ๋‹จ์ผ ์ž ์žฌ ํ† ํฐ์œผ๋กœ ์••์ถ•ํ•˜๋Š” OneLatent ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ์ถ”๋ก  ๋‹จ๊ณ„๋ฅผ ์ด๋ฏธ์ง€๋กœ ๋ Œ๋”๋งํ•˜๊ณ  DeepSeek-OCR์˜ ์€๋‹‰ ์ƒํƒœ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ์ •๋ก ์  ๊ฐ๋… ์‹ ํ˜ธ๋ฅผ ์–ป์Œ์œผ๋กœ์จ, ๋ชจ๋ธ์ด ์ƒ์„ธํ•œ ํ…์ŠคํŠธ๋ฅผ ์ถœ๋ ฅํ•˜์ง€ ์•Š๊ณ ๋„ ๊ฐ๋…์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด OneLatent๋Š” ํ‰๊ท  ์ถœ๋ ฅ ๊ธธ์ด๋ฅผ $11\times$ ์ค„์ด๋ฉด์„œ ์ •ํ™•๋„๋Š” $2.21$ ์ •๋„๋งŒ ๊ฐ์†Œ์‹œํ‚ค๊ณ , ์ถœ๋ ฅ ํ† ํฐ ๊ธฐ์—ฌ๋„๋ฅผ $6.8\times$ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
CoT ์ถ”๋ก ์˜ ํšจ์œจ์„ฑ์„ ํš๊ธฐ์ ์œผ๋กœ ๊ฐœ์„ ํ•˜์—ฌ ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.
โ€ข
ํ…์ŠคํŠธ๋ฅผ ์ด๋ฏธ์ง€๋กœ ๋ Œ๋”๋งํ•˜๋Š” ๋ฐฉ์‹์„ ํ†ตํ•ด ์ถ”๋ก  ๊ณผ์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๊ณ  ๊ฐ์‚ฌํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์žฅ๊ธฐ์ ์ธ ๋…ผ๋ฆฌ ์ถ”๋ก  ๊ณผ์ œ์—์„œ ๋‹จ์ผ ์ž ์žฌ ํ† ํฐ๋งŒ์œผ๋กœ๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ ์••์ถ•๋œ ํ™˜๊ฒฝ์—์„œ๋„ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ์šฐ์ˆ˜ํ•จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ…์ŠคํŠธ๋ฅผ ์ด๋ฏธ์ง€๋กœ ๋ Œ๋”๋งํ•˜๋Š” ๊ณผ์ • ์ž์ฒด์˜ ๊ณ„์‚ฐ ๋น„์šฉ ๋ฐ ์ •ํ™•๋„ ์†์‹ค ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘