Sign In

From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

Created by
  • Haebom
Category
Empty

์ €์ž

Ziyan Wang, Enmao Diao, Qi Le, Pu Wang, Minwoo Lee, Shu-ping Yeh, Evgeny Stupachenko, Hao Feng, Li Yang

๐Ÿ’ก ๊ฐœ์š”

์ด ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ํšจ์œจ์ ์ธ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ๊ตฌ์กฐ์  ๊ฐ€์ง€์น˜๊ธฐ(structured pruning) ๊ธฐ๋ฒ•์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๊ตญ์†Œ์ (local) ๊ฐ€์ง€์น˜๊ธฐ ๋ฐฉ์‹์ด ์ž‘์—…๋ณ„ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ํ•œ๊ณ„๋ฅผ ๋ณด์ด์ž, ๋ณธ ์—ฐ๊ตฌ๋Š” ์ „์—ญ์ (global) ๊ด€์ ์—์„œ ๊ฐ€์ง€์น˜๊ธฐ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” GISP(Global Iterative Structured Pruning) ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. GISP๋Š” ์†์‹ค ๊ธฐ๋ฐ˜ ์ค‘์š”๋„ ์ ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ตฌ์กฐ๋ฅผ ์ œ๊ฑฐํ•จ์œผ๋กœ์จ, ์›๋ž˜ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๋” ๋†’์€ ํฌ์†Œ์„ฑ์„ ๋‹ฌ์„ฑํ•˜๊ณ  ํŠน์ • ์ž‘์—…์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์ž‘์—… ๋งž์ถคํ˜• ๊ฐ€์ง€์น˜๊ธฐ์˜ ์ค‘์š”์„ฑ: ์ „์—ญ์ ์ธ ์ค‘์š”๋„ ์ธก์ •๊ณผ ๋ฐ˜๋ณต์ ์ธ ๊ฐ€์ง€์น˜๊ธฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ํ†ตํ•ด, ๋‹จ์ˆœํžˆ ์ผ๋ฐ˜์ ์ธ ์„ฑ๋Šฅ ์ง€ํ‘œ(์˜ˆ: perplexity)๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด ํŠน์ • ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
'ํ•œ ๋ฒˆ ๊ฐ€์ง€์น˜๊ธฐ, ์—ฌ๋Ÿฌ ๋ฒˆ ๋ฐฐํฌ' ์›Œํฌํ”Œ๋กœ์šฐ ์ง€์›: ๋ฐ˜๋ณต์  ๊ฐ€์ง€์น˜๊ธฐ ๊ณผ์ •์—์„œ ์ƒ์„ฑ๋˜๋Š” ์ค‘์ฒฉ๋œ ์„œ๋ธŒ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋Š” ๋‹ค์–‘ํ•œ ํฌ์†Œ์„ฑ ์ˆ˜์ค€์— ๋งž๋Š” ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๊ณ  ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๊ฐ€์ง€์น˜๊ธฐ ์‹œ์  ๋ฐ ๋ณต์žก์„ฑ: ๋ณธ ์—ฐ๊ตฌ๋Š” ํ›„ํ•™์Šต(post-training) ๊ฐ€์ง€์น˜๊ธฐ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜์ง€๋งŒ, ์ตœ์ ์˜ ๊ฐ€์ง€์น˜๊ธฐ ์ผ์ •์ด๋‚˜ ํŠน์ • ์ž‘์—…์— ๋”ฐ๋ฅธ ๊ฐ€์ง€์น˜๊ธฐ ๊ฐ•๋„ ์กฐ์ ˆ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋ฐ˜๋ณต์ ์ธ ์ตœ์ ํ™” ๊ณผ์ • ์ž์ฒด์˜ ๊ณ„์‚ฐ ๋น„์šฉ๋„ ๊ณ ๋ คํ•ด์•ผ ํ•  ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค.
๐Ÿ‘