Sign In

GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Sourish Wawdhane, Avinash Kumar, Poulami Das

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ Mixture-of-Experts (MoE) ์‹œ์Šคํ…œ์—์„œ GPU ๊ฐ„ ์„ฑ๋Šฅ ๋ถˆ๊ท ํ˜•์œผ๋กœ ์ธํ•œ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ „๋ฌธ๊ฐ€(expert) ํ• ๋‹น ๊ธฐ๋ฒ•์ธ GEM์„ ์ œ์•ˆํ•œ๋‹ค. GEM์€ GPU์˜ ์„ฑ๋Šฅ ๋ณ€๋™์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ, ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ์ „๋ฌธ๊ฐ€์™€ ๊ฐ„ํ—์ ์œผ๋กœ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜๋Š” ์ „๋ฌธ๊ฐ€๋ฅผ ๋ถ„์‚ฐ์‹œํ‚ค๊ณ  ๋А๋ฆฐ GPU์— ๋ฐฐ์น˜๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•จ์œผ๋กœ์จ ๋ ˆ์ด์–ด ์ฒ˜๋ฆฌ ์™„๋ฃŒ ์‹œ๊ฐ„์„ ๊ท ๋“ฑํ•˜๊ฒŒ ๋งž์ถ˜๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ‰๊ท  7.9%, ์ตœ๋Œ€ 16.5%์˜ ์ข…๋‹จ ๊ฐ„ ์ง€์—ฐ ์‹œ๊ฐ„ ๊ฐ์†Œ๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
MoE ๋ชจ๋ธ์˜ ํšจ์œจ์ ์ธ ์„œ๋น™์„ ์œ„ํ•ด์„œ๋Š” GPU์˜ ์„ฑ๋Šฅ ๋ณ€๋™์„ฑ์„ ๊ณ ๋ คํ•œ ์ „๋ฌธ๊ฐ€ ํ• ๋‹น ์ „๋žต์ด ์ค‘์š”ํ•จ์„ ์‹œ์‚ฌํ•œ๋‹ค.
โ€ข
์ „๋ฌธ๊ฐ€๋ฅผ '์ผ๊ด€์ ์ธ(consistent)' ์ „๋ฌธ๊ฐ€์™€ '์‹œ๊ฐ„์ ์ธ(temporal)' ์ „๋ฌธ๊ฐ€๋กœ ๋ถ„๋ฅ˜ํ•˜์—ฌ ๋ถ„์‚ฐ ๋ฐฐ์น˜ํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ•œ๋‹ค.
โ€ข
์ œ์•ˆ๋œ GEM ๊ธฐ๋ฒ•์€ ์‹ค์ œ ์‹คํ—˜์—์„œ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋‚˜, ๋ชจ๋“  ์ข…๋ฅ˜์˜ MoE ๋ชจ๋ธ ๋ฐ ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ์— ๋Œ€ํ•ด ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜๋Š”์ง€๋Š” ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋‹ค.
๐Ÿ‘