Sign In

Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving

Created by
  • Haebom
Category
Empty

์ €์ž

Chendong Song, Meixuan Wang, Hang Zhou, Hong Liang, Yuan Lyu, Zixi Chen, Yuwei Fan, Zijie Zhou

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ LLM ๋””์ฝ”๋”ฉ ์•„ํ‚คํ…์ฒ˜์ธ Attention-FFN Disaggregation(AFD)์—์„œ ๋ฐœ์ƒํ•˜๋Š” Attention๊ณผ FFN ์—ฐ์‚ฐ ๋น„์œจ์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๋ฌด์ž‘์œ„์ ์ธ ์›Œํฌ๋กœ๋“œ์™€ ์žฅ์น˜ ๋™๊ธฐํ™”๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๋ถ„์„ํ•˜๊ณ , ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ด๋ก ์  ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ถ„์„ ๋ชจ๋ธ์€ ์ตœ์ ์˜ Attention/FFN ๋น„์œจ์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒฐ๊ณผ ์‹ค์ œ ์ตœ์  ๋น„์œจ๊ณผ ๋†’์€ ์ •ํ™•๋„๋กœ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
AFD ์•„ํ‚คํ…์ฒ˜์—์„œ Attention๊ณผ FFN ์—ฐ์‚ฐ์˜ ๋น„์œจ์ด ์ „์ฒด ์„œ๋น„์Šค ์„ฑ๋Šฅ์— ๋งค์šฐ ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ์ ์„ ์ด๋ก ์ ์œผ๋กœ ๊ทœ๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ถ„์„ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์›Œํฌ๋กœ๋“œ์˜ ๋ฌด์ž‘์œ„์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ ์˜ ์ž์› ํ”„๋กœ๋น„์ €๋‹ ๋น„์œจ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ํŠน์ • $r$A--$1$F ํ† ํด๋กœ์ง€์— ๊ธฐ๋ฐ˜ํ•œ ๋ถ„์„์ด๋ฉฐ, ์‹ค์ œ ๋ณต์žกํ•œ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ ํ™˜๊ฒฝ์—์„œ์˜ ์ ์šฉ ๋ฐ ๋‹ค์–‘ํ•œ ํ† ํด๋กœ์ง€์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™”๊ฐ€ ํ–ฅํ›„ ์—ฐ๊ตฌ ๊ณผ์ œ๋กœ ๋‚จ์•„์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘