Sign In

Discovering Multiagent Learning Algorithms with Large Language Models

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Zun Li, John Schultz, Daniel Hennes, Marc Lanctot

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ํ™œ์šฉํ•˜์—ฌ ๋ถˆ์™„์ „ ์ •๋ณด ๊ฒŒ์ž„์—์„œ์˜ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๊ฐ•ํ™” ํ•™์Šต(MARL) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ž๋™์œผ๋กœ ๋ฐœ๊ฒฌํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. LLM ๊ธฐ๋ฐ˜์˜ ์—์ด์ „ํŠธ ํ”„๋ ˆ์ž„์›Œํฌ์ธ AlphaEvolve๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ CFR๊ณผ PSRO๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ๊ฒŒ์ž„ ์ด๋ก  ํŒจ๋Ÿฌ๋‹ค์ž„์—์„œ ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ VAD-CFR๊ณผ SHOR-PSRO๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํƒ์ƒ‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ๊ธฐ์กด ์ตœ์ฒจ๋‹จ ์ธ๊ฐ„ ์„ค๊ณ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๊ฒฝ์Ÿ๋ ฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ์ดํ›„ ์ฒด๊ณ„์ ์ธ ์ œ๊ฑฐ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ํ•ต์‹ฌ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ฝ”์–ด์˜ ์ค‘์š”์„ฑ์„ ๋ฐํ˜€๋‚ด๊ณ , ๋” ๋‚˜์•„๊ฐ€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜๊ณ  ๋ณต์žก์„ฑ์ด ์ค„์–ด๋“  WOP-CFR๊ณผ PM-PSRO๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์ตœ์†Œํ™”๋œ ์†”๋ฒ„๋ฅผ ๊ฐœ๋ฐœํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์„ ํ™œ์šฉํ•˜์—ฌ ๋ณต์žกํ•œ MARL ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ž๋™์œผ๋กœ ๋ฐœ๊ฒฌํ•˜๊ณ  ๋ฐœ์ „์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
LLM์ด ๋ฐœ๊ฒฌํ•œ ๋ณต์žกํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์—์„œ ํ•ต์‹ฌ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ฝ”์–ด๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๊ณ  ๋ณต์žก์„ฑ์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
LLM์ด ํŠน์ • ํ›ˆ๋ จ ์„ธํŠธ์— ์ตœ์ ํ™”๋œ ๋ณต์žกํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด, ์ด๋Ÿฌํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ๋ฐ˜๋“œ์‹œ ์ตœ์ ์˜ ์ผ๋ฐ˜ํ™”๋กœ ์ด์–ด์ง€๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค.
๐Ÿ‘