Sign In

GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Maoyang Xiang, Bo Wang, Tao Luo

๐Ÿ’ก ๊ฐœ์š”

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM) ๋ฐ ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ(ViT)์˜ ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค ๋ฐฐํฌ๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ œ์•ฝ๊ณผ MAC ์—ฐ์‚ฐ์˜ ์‹œ๊ฐ„ ๋ณ‘๋ชฉ ํ˜„์ƒ์œผ๋กœ ์ธํ•ด ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ์ €๋น„ํŠธ ์–‘์žํ™”์—์„œ Power-of-Two (PoT)๋Š” ๋น„ํŠธ ์‹œํ”„ํŠธ ์—ฐ์‚ฐ์œผ๋กœ MAC์„ ๋Œ€์ฒดํ•˜์—ฌ ํ•˜๋“œ์›จ์–ด ํšจ์œจ์„ฑ์„ ๋†’์ด์ง€๋งŒ, ๋‚ฎ์€ ๊ฐ๋„ ํ•ด์ƒ๋„ ๋ฌธ์ œ๋กœ ์ธํ•ด ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ GoQuant๋Š” ๋“€์–ผ-๋ฒ ์ด์Šค ๊ธฐํ•˜ํ•™์  ํˆฌ์˜์„ ํ†ตํ•ด ์ž”์ฐจ ๊ฒฉ์ž๋ฅผ ํ•ฉ์„ฑํ•˜์—ฌ ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๊ณ , ๋ถ„์„์  ์†”๋ฒ„๋ฅผ ํ†ตํ•ด ๋น ๋ฅธ ๋ชจ๋ธ ๋ณด์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
GoQuant๋Š” ๊ธฐ์กด์˜ MAC ์—ฐ์‚ฐ ์ค‘์‹ฌ ์–‘์žํ™” ๋ฐฉ๋ฒ•๋ก (์˜ˆ: AWQ)์ด ๊ฒช๋Š” ๋‚ฎ์€ ๊ฐ๋„ ํ•ด์ƒ๋„ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, ํŠนํžˆ 3๋น„ํŠธ์™€ ๊ฐ™์€ ์ดˆ์ €๋น„ํŠธ ํ™˜๊ฒฝ์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ถ„์„์  ์†”๋ฒ„๋Š” ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ธ ๊ฒฝ์‚ฌ ๊ธฐ๋ฐ˜ ์ตœ์ ํ™” ๋ฐฉ์‹ ๋Œ€๋น„ ํ›จ์”ฌ ๋น ๋ฅธ ์ „์ฒด ๋ชจ๋ธ ๋ณด์ • ์‹œ๊ฐ„์„ ๋‹ฌ์„ฑํ•˜์—ฌ ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.
โ€ข
GoQuant๋Š” LLM ๋ฐ ViT๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ‘œ์ค€์…€ RTL ํ•ฉ์„ฑ ๊ฒฐ๊ณผ์—์„œ๋„ ๊ธ์ •์ ์ธ ํ•˜๋“œ์›จ์–ด ํšจ์œจ์„ฑ ๊ฐœ์„ ์„ ๋ณด์—ฌ์ฃผ์–ด, ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค์—์„œ์˜ ํšจ์œจ์ ์ธ ๋ฐฐํฌ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
GoQuant๋Š” "strictly shift-and-add operations"๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ์—ฌ์ „ํžˆ ํŠน์ • ๋ณต์žกํ•œ ์—ฐ์‚ฐ์—์„œ๋Š” ์ตœ์ ํ™”์˜ ์—ฌ์ง€๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ์•„ํ‚คํ…์ฒ˜์—์„œ์˜ ์„ฑ๋Šฅ ๊ฒ€์ฆ ๋ฐ ์ถ”๊ฐ€์ ์ธ ์ตœ์ ํ™” ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘