Sign In

SD-MoE: Spectral Decomposition for Effective Expert Specialization

Created by
  • Haebom
Category
Empty

์ €์ž

Ruijun Huang, Fang Dong, Xin Zhang, Hengjie Cao, Zhendong Huang, Anrui Chen, Jixian Zhou, Mengyi Chen, Yifeng Yang, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Robert P. Dick, Yuan Cheng, Fan Yang, Tun Lu, Chun Zhang, Li Shang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” Mixture-of-Experts (MoE) ๋ชจ๋ธ์—์„œ ์ „๋ฌธ๊ฐ€ ํŠนํ™” ์‹คํŒจ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฐ ๊ธฐ์šธ๊ธฐ ๊ณต๊ฐ„์˜ ์ŠคํŽ™ํŠธ๋Ÿผ ๋ถ„์„์„ ํ†ตํ•ด ์ „๋ฌธ๊ฐ€ ๊ฐ„ ์ŠคํŽ™ํŠธ๋Ÿผ ์„ฑ๋ถ„ ์ค‘๋ณต ๋ฐ ๊ฒŒ์ดํŒ… ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ํŽธํ–ฅ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ŠคํŽ™ํŠธ๋Ÿผ ๊ณต๊ฐ„์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋ถ„ํ•ดํ•˜๋Š” SD-MoE๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋Š” downstream ํƒœ์Šคํฌ ์„ฑ๋Šฅ ํ–ฅ์ƒ๊ณผ ํšจ๊ณผ์ ์ธ ์ „๋ฌธ๊ฐ€ ํŠนํ™”๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
MoE ๋ชจ๋ธ์˜ ์ „๋ฌธ๊ฐ€ ํŠนํ™” ์‹คํŒจ๋Š” ์ŠคํŽ™ํŠธ๋Ÿผ ์„ฑ๋ถ„ ์ค‘๋ณต ๋ฐ ๊ฒŒ์ดํŒ… ํŽธํ–ฅ์— ๊ธฐ์ธํ•จ์„ ์ŠคํŽ™ํŠธ๋Ÿผ ๋ถ„์„์œผ๋กœ ๊ทœ๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ SD-MoE๋Š” ์ตœ์†Œํ•œ์˜ ๊ณ„์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€๋กœ ์ „๋ฌธ๊ฐ€ ํŠนํ™”๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐœ์„ ํ•˜๊ณ  ๋‹ค์–‘ํ•œ MoE ์•„ํ‚คํ…์ฒ˜์— ์ ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ๋ถ„์„์€ ์ธ๊ฐ„ ์ฝ”ํผ์Šค์˜ ๋ณดํŽธ์ ์ธ ์ €๊ณ„์ˆ˜ ๊ตฌ์กฐ์— ์˜ํ•ด ์ฃผ๋กœ ์ฃผ๋„๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ๋Œ€ํ•œ SD-MoE์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘