Sign In

ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Zhishen Sun, Sizhe Dang, Guang Dai, Haishan Ye

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋†’์€ ๊ฐ•ํ™”ํ•™์Šต(RL) ๊ธฐ๋ฐ˜ LLM ๋ฏธ์„ธ์กฐ์ •์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ง„ํ™” ์ „๋žต(ES)์˜ ์ œ๋กœ-์˜ค๋” ํƒ์ƒ‰๊ณผ ๋‚ ์นด๋กœ์›€ ์ธ์‹ ์ตœ๋Œ€ํ™”(SAM)๋ฅผ ๊ฒฐํ•ฉํ•œ ESSAM ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ESSAM์€ GSM8K ๋ฐ์ดํ„ฐ์…‹์—์„œ RL ๋ฐฉ๋ฒ•๋ก ๊ณผ ๋น„๊ต ๊ฐ€๋Šฅํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋ฉด์„œ๋„ GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ํš๊ธฐ์ ์œผ๋กœ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€์ ์ธ ์ผ๋ฐ˜ํ™” ์‹คํ—˜์—์„œ๋„ ESSAM์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์€ ๋” ์šฐ์ˆ˜ํ•œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์ œํ•œ๋œ GPU ๋ฉ”๋ชจ๋ฆฌ ํ™˜๊ฒฝ์—์„œ๋„ LLM์˜ ์ˆ˜ํ•™์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฏธ์„ธ์กฐ์ • ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ESSAM์€ ๊ธฐ์กด RL ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• ๋Œ€๋น„ ์„ฑ๋Šฅ ์ €ํ•˜ ์—†์ด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ํฌ๊ฒŒ ์ ˆ๊ฐํ•˜์—ฌ ์ ‘๊ทผ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
โ€ข
๋น ๋ฅธ ๋ณ€ํ˜• ๋ชจ๋ธ์€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์—ฐ์‚ฐ ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ž ์žฌ๋ ฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ์ฃผ๋กœ ์ˆ˜ํ•™์  ์ถ”๋ก  ํƒœ์Šคํฌ์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฏ€๋กœ, ๋‹ค์–‘ํ•œ LLM ํƒœ์Šคํฌ์— ๋Œ€ํ•œ ESSAM์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ๊ฒ€์ฆ์ด ์ถ”๊ฐ€์ ์œผ๋กœ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘