Sign In

Weight Concentration Regularization for Improving Pruning Robustness Under High Sparsity

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Vincent-Daniel Yun, Junhyuk Jo, Sunwoo Lee

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ์›์ƒท ๊ฐ€์ง€์น˜๊ธฐ(one-shot pruning) ์‹œ ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๊ฐ€์ค‘์น˜ ์ง‘์ค‘ ์ •๊ทœํ™”(Weight Concentration Regularizer, WCR)๋Š” ํ›ˆ๋ จ ๋‹จ๊ณ„์—์„œ ์†Œ์ˆ˜์˜ ์ค‘์š” ํŒŒ๋ผ๋ฏธํ„ฐ์— ์—๋„ˆ์ง€๋ฅผ ์ง‘์ค‘์‹œํ‚ค๊ณ  ๋‚˜๋จธ์ง€๋Š” 0์œผ๋กœ ๋ณด๋‚ด, ๊ฐ€์ง€์น˜๊ธฐ ์‹œ ๊ธฐ๋Šฅ์ ์œผ๋กœ ๊ธฐ์—ฌ๋„๊ฐ€ ๋‚ฎ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์ด ์ฃผ๋กœ ์ œ๊ฑฐ๋˜๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด LLM ํŒŒ์ธํŠœ๋‹, ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ์˜๋ฃŒ ์˜์ƒ ๋ถ„ํ•  ๋“ฑ ๋‹ค์–‘ํ•œ ์ž‘์—…์—์„œ ๊ฐ€์ง€์น˜๊ธฐ ๊ฐ•๊ฑด์„ฑ(pruning robustness)์„ ์ผ๊ด€๋˜๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
ํ›ˆ๋ จ ๋‹จ๊ณ„์—์„œ์˜ ์ƒˆ๋กœ์šด ์ •๊ทœํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ๊ณ ๋ฐ€๋„ ๊ฐ€์ง€์น˜๊ธฐ(high sparsity) ํ™˜๊ฒฝ์—์„œ ๋ชจ๋ธ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ WCR์€ ๊ธฐ์กด์˜ $\ell_1$ ์ •๊ทœํ™”๋‚˜ DeepHoyer์™€ ๋‹ฌ๋ฆฌ, ๊ฐ€์ค‘์น˜ ์—๋„ˆ์ง€๋ฅผ ํŠน์ • ํŒŒ๋ผ๋ฏธํ„ฐ์— ์ง‘์ค‘์‹œ์ผœ ๊ฐ€์ง€์น˜๊ธฐ์˜ ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด์˜ ๊ฐ€์ง€์น˜๊ธฐ ๊ฐ•๊ฑด์„ฑ ์ตœ์ ํ™” ๊ธฐ๋ฒ•๋“ค๊ณผ๋„ ํ˜ธํ™˜๋˜์–ด ์„ฑ๋Šฅ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ WCR์˜ ์ˆ˜๋ ด ๋ถ„์„์„ ์ œ๊ณตํ•˜์ง€๋งŒ, ์‹ค์ œ ์ ์šฉ ์‹œ ์ตœ์ ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๋ฐ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ๊ฒ€์ฆ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘