Sign In

Theory-optimal Quantization Based on Flatness

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Xiusheng Huang, Zhe Li, Xuanwu Yin, Lu Wang, Yequan Wang, Dong Li, Emad Barsoum, Kang Liu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ LLM ์–‘์žํ™” ์‹œ ๋ฐœ์ƒํ•˜๋Š” ํ™œ์„ฑํ™” ์•„์›ƒ๋ผ์ด์–ด ๋ฌธ์ œ๋กœ ์ธํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์•„์›ƒ๋ผ์ด์–ด ๋ถ„ํฌ๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” 'Flatness'๋ผ๋Š” ์ƒˆ๋กœ์šด ์ง€ํ‘œ๋ฅผ ๋„์ž…ํ•˜๊ณ , ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ก ์ ์œผ๋กœ ์ตœ์ ํ™”๋œ ์–‘์žํ™” ๋ฐฉ๋ฒ•์„ ๋„์ถœํ–ˆ์Šต๋‹ˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” Bidirectional Diagonal Quantization (BDQ) ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋Œ€๊ฐ ์—ฐ์‚ฐ์„ ํ†ตํ•ด ์•„์›ƒ๋ผ์ด์–ด ํŒจํ„ด์„ ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„์‚ฐ์‹œ์ผœ LLM ์–‘์žํ™” ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM ์–‘์žํ™”์—์„œ ํ™œ์„ฑํ™” ์•„์›ƒ๋ผ์ด์–ด์˜ ์˜ํ–ฅ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๊ณ , ์ด๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๊ธฐ ์œ„ํ•œ 'Flatness' ์ง€ํ‘œ๋ฅผ ์ œ์‹œํ•จ์œผ๋กœ์จ ๊ธฐ์กด ์—ฐ๊ตฌ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
Bidirectional Diagonal Quantization (BDQ) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด W4A4 ๋ฐ W2A4KV16๊ณผ ๊ฐ™์€ ๋‚ฎ์€ ๋น„ํŠธ ์–‘์žํ™”์—์„œ๋„ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๋ฐ ๊ธฐ์กด ์ตœ์‹  ๊ธฐ๋ฒ• ๋Œ€๋น„ ์šฐ์ˆ˜์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
BDQ๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋Œ€๊ฐ ์—ฐ์‚ฐ์„ ํ†ตํ•ด ์•„์›ƒ๋ผ์ด์–ด ๋ถ„์‚ฐ์„ ์ตœ์ ํ™”ํ•˜๋ฉฐ, ์ด๋Š” LLM ์–‘์žํ™”์˜ ์ด๋ก ์ , ์‹ค์šฉ์  ์ธก๋ฉด ๋ชจ๋‘์—์„œ ์ค‘์š”ํ•œ ๋ฐœ์ „์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์˜ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์ด๋‚˜ ๋‹ค์–‘ํ•œ LLM ์•„ํ‚คํ…์ฒ˜์—์„œ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘