Sign In

EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Shu-Hao Zhang, Le-Tong Huang, Xiang-Sheng Deng, Xin-Yi Zou, Chen Wu, Nan Li, Shao-Qun Zhang, Zhi-Hua Zhou

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋ฆฌ์†Œ์Šค๊ฐ€ ์ œํ•œ๋œ ํ™˜๊ฒฝ์—์„œ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ํšจ์œจ์ ์œผ๋กœ ๋ฐฐํฌํ•˜๊ธฐ ์œ„ํ•œ ๊ฒฝ๋Ÿ‰ ํ”„๋ ˆ์ž„์›Œํฌ์ธ EdgeRazor๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. EdgeRazor๋Š” ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ์–‘์žํ™” ์ธ์‹ ์ฆ๋ฅ˜ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ LLM์˜ ์„ฑ๋Šฅ ์ €ํ•˜ ์—†์ด ์••์ถ•๋ฅ ์„ ๋†’์ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, EdgeRazor๋Š” ๊ธฐ์กด ์ตœ์ฒจ๋‹จ 2๋น„ํŠธ ๋ฐ 3๋น„ํŠธ ๋ชจ๋ธ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉด์„œ๋„, ํ•™์Šต ๋น„์šฉ์„ ํฌ๊ฒŒ ์ ˆ๊ฐํ•˜๊ณ  ์ €์žฅ ๊ณต๊ฐ„์„ ์ค„์ด๋ฉฐ ๋””์ฝ”๋”ฉ ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ํšจ๊ณผ๋ฅผ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ทน๋Œ€ํ™”๋œ ํšจ์œจ์„ฑ: EdgeRazor๋Š” ๊ธฐ์กด์˜ ์–‘์žํ™” ๋ฐฉ์‹์ด 4๋น„ํŠธ ์ดํ•˜์—์„œ ๊ฒช๋Š” ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ณ , ๋‹ค์–‘ํ•œ ๋น„ํŠธ ์ˆ˜์ค€์—์„œ ๋†’์€ ์••์ถ•๋ฅ ๊ณผ ๋น ๋ฅธ ์ถ”๋ก  ์†๋„๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํšจ๊ณผ์ ์ธ ์ฆ๋ฅ˜ ๊ธฐ๋ฒ•: ๊ตฌ์กฐ์  ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ์–‘์žํ™”, ๊ณ„์ธต ์ ์‘ ํŠน์ง• ์ฆ๋ฅ˜, ์—”ํŠธ๋กœํ”ผ ์ธ์‹ KL ๋ฐœ์‚ฐ์„ ํ†ตํ•ด ์–‘์žํ™”๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ๋ณด์กดํ•˜๊ณ  ํ•™์Šต ํšจ์œจ์„ ๋†’์ž…๋‹ˆ๋‹ค.
โ€ข
์ œํ•œ์ ์ธ ํ•˜๋“œ์›จ์–ด ์ ์šฉ: ์ œ์•ˆ๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ํŠน์ • ํ•˜๋“œ์›จ์–ด ํ™˜๊ฒฝ์—์„œ์˜ ์„ฑ๋Šฅ ๊ฒ€์ฆ์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ์žˆ์–ด, ๋” ๋„“์€ ๋ฒ”์œ„์˜ ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋ฐ ์ตœ์ ํ™” ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘