Sign In

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Suyoung Kim, Sunghyun Wee, Hyeonjin Kim, Kyomin Hwang, Hyunho Lee, Nojun Kwak

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM) ์–‘์žํ™”์—์„œ ๋ฐœ์ƒํ•˜๋Š” ํ™œ์„ฑํ™” ๊ฐ’ ์ด์ƒ์น˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์ธ ReSpinQuant๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ReSpinQuant๋Š” ๊ธฐ์กด์˜ ์ „์—ญ ํšŒ์ „ ๋ฐฉ์‹์ด ๊ฐ–๋Š” ํ‘œํ˜„๋ ฅ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ , ๊ณ„์ธต๋ณ„ ์ ์‘ ๋ฐฉ์‹์˜ ์˜จ๋ผ์ธ ๊ณ„์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ž”์ฐจ ๋ถ€๋ถ„ ๊ณต๊ฐ„ ํšŒ์ „์„ ํ™œ์šฉํ•œ ์˜คํ”„๋ผ์ธ ํ™œ์„ฑํ™” ํšŒ์ „ ์œตํ•ฉ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ReSpinQuant๋Š” ๋†’์€ ํ‘œํ˜„๋ ฅ๊ณผ ๊ฑฐ์˜ ์—†๋Š” ์ถ”๋ก  ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•˜๋ฉฐ, W4A4 ๋ฐ W3A3 ์–‘์žํ™”์—์„œ ์ตœ์ฒจ๋‹จ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ณ„์ธต๋ณ„ ์ ์‘์˜ ๋†’์€ ํ‘œํ˜„๋ ฅ๊ณผ ์ „์—ญ ํšŒ์ „์˜ ํšจ์œจ์„ฑ์„ ๊ฒฐํ•ฉํ•œ ํšจ๊ณผ์ ์ธ LLM ์–‘์žํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์˜คํ”„๋ผ์ธ ํ™œ์„ฑํ™” ํšŒ์ „ ์œตํ•ฉ์„ ํ†ตํ•ด ๊ธฐ์กด ๊ณ„์ธต๋ณ„ ์–‘์žํ™” ๋ฐฉ๋ฒ•๋ก ์˜ ์˜จ๋ผ์ธ ๊ณ„์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋‹ค์–‘ํ•œ ์–‘์žํ™” ์„ค์ •์—์„œ ๊ธฐ์กด ์ตœ์‹  ๋ฐฉ๋ฒ•๋ก ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ์ž…์ฆํ•˜์—ฌ LLM ํšจ์œจ์„ฑ ํ–ฅ์ƒ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์˜ ๋ณต์žก์„ฑ์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์‹ค์ œ ์ ์šฉ ์‹œ ์ถ”๊ฐ€์ ์ธ ์ตœ์ ํ™”๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘