Sign In

Pretraining large language models with MXFP4

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Musa Cim, Poovaiah Palangappa, Miro Hodak, Ravi Dwivedula, Meena Arunachalam, Mahmut Taylan Kandemir

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM) ํ•™์Šต ์‹œ FP4(4๋น„ํŠธ ๋ถ€๋™์†Œ์ˆ˜์ ) ์–‘์žํ™”๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๋ฐœ์‚ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์ˆœ๋ฐฉํ–ฅ ํ™œ์„ฑํ™” ๋ฐ ํ™œ์„ฑํ™” ๊ธฐ์šธ๊ธฐ๊ฐ€ ์•ˆ์ •์ ์ธ ์ƒํƒœ์—์„œ๋„ ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์ง€๋Š” ์›์ธ์„ MXFP4(Mixed Precision FP4) ์–‘์žํ™”์˜ ๊ฐ ๋‹จ๊ณ„(์ˆœ๋ฐฉํ–ฅ ์ „ํŒŒ, ํ™œ์„ฑํ™” ๊ธฐ์šธ๊ธฐ, ๊ฐ€์ค‘์น˜ ๊ธฐ์šธ๊ธฐ)๋ฅผ ์ ์ง„์ ์œผ๋กœ ํ™œ์„ฑํ™”ํ•˜๋ฉฐ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ ๊ฒฐ๊ณผ, ๊ฐ€์ค‘์น˜ ๊ธฐ์šธ๊ธฐ์˜ FP4 ์–‘์žํ™”๊ฐ€ ํ•™์Šต ๋ฐœ์‚ฐ์˜ ์ฃผ๋œ ์›์ธ์ด๋ฉฐ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๊ตฌ์กฐ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
FP4 ์–‘์žํ™”๋ฅผ LLM ํ•™์Šต์— ์ ์šฉํ•  ๋•Œ, ๊ฐ€์ค‘์น˜ ๊ธฐ์šธ๊ธฐ(Wgrad) ์–‘์žํ™”๊ฐ€ ํ•™์Šต ์•ˆ์ •์„ฑ์„ ์ €ํ•ดํ•˜๋Š” ๊ฐ€์žฅ ํฐ ์š”์ธ์ž„์„ ์‹คํ—˜์ ์œผ๋กœ ๊ทœ๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ˆœ๋ฐฉํ–ฅ ์ „ํŒŒ(Fprop)์™€ ํ™œ์„ฑํ™” ๊ธฐ์šธ๊ธฐ(Dgrad)์˜ FP4 ์–‘์žํ™”๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ํ•™์Šต ์•ˆ์ •์„ฑ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์ด ์ ์œผ๋ฉฐ, ์ ์€ ์–‘์˜ ์ถ”๊ฐ€ ํ† ํฐ ์š”๊ตฌ๋Ÿ‰์œผ๋กœ๋„ ์•ˆ์ •์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•จ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
โ€ข
FP4 ํ•™์Šต์˜ ๋ถˆ์•ˆ์ •์„ฑ์€ ๋‹จ์ˆœํžˆ ๋ฌด์ž‘์œ„์„ฑ ๋ถ€์กฑ ๋•Œ๋ฌธ์ด ์•„๋‹ˆ๋ผ, ๋ฏผ๊ฐํ•œ ๊ธฐ์šธ๊ธฐ ๊ฒฝ๋กœ๋ฅผ ๋”ฐ๋ผ ๋ฐœ์ƒํ•˜๋Š” ๊ตฌ์กฐ์ ์ธ ๋ฏธ์„ธ ์Šค์ผ€์ผ๋ง ์˜ค๋ฅ˜์— ์˜ํ•ด ๋ฐœ์ƒํ•˜๋ฉฐ, ๊ฒฐ์ •๋ก ์  Hadamard ํšŒ์ „์ด ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ์†Œํ”„ํŠธ์›จ์–ด ์—๋ฎฌ๋ ˆ์ด์…˜์— ์˜์กดํ•˜์ง€ ์•Š๊ณ  AMD Instinct MI355X GPU์˜ ๋„ค์ดํ‹ฐ๋ธŒ MXFP4 ์ง€์›์„ ํ™œ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ์œผ๋ฉฐ, ์ด๋Š” ํ–ฅํ›„ LLM ํ•™์Šต ํšจ์œจ์„ฑ ์ฆ๋Œ€๋ฅผ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด ์ˆ˜์ค€์˜ ์ตœ์ ํ™” ์—ฐ๊ตฌ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘