Sign In

Distributionally Robust Cooperative Multi-Agent Reinforcement Learning via Robust Value Factorization

Created by
  • Haebom
Category
Empty

์ €์ž

Chengrui Qu, Christopher Yeh, Kishan Panaganti, Eric Mazumdar, Adam Wierman

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ๋ถˆํ™•์‹ค์„ฑ์œผ๋กœ ์ธํ•ด ๊ธฐ์กด ํ˜‘๋ ฅ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๊ฐ•ํ™”ํ•™์Šต(MARL)์˜ ์‹ ๋ขฐ์„ฑ์ด ๋–จ์–ด์ง„๋‹ค๋Š” ๋ฌธ์ œ์ ์„ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์—์ด์ „ํŠธ ๊ฐœ๋ณ„์ ์œผ๋กœ๋Š” ๊ฐ•๊ฑดํ•œ(robust) ํƒ์š•์  ํ–‰๋™์ด ๊ฐ•๊ฑดํ•œ ํŒ€ ์ตœ์  ํ–‰๋™๊ณผ ์ผ์น˜ํ•˜๋„๋ก ํ•˜๋Š” ์ƒˆ๋กœ์šด ์›์น™์ธ '๋ถ„ํฌ ๊ฐ•๊ฑด IGM (DrIGM)'์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋ฐฉ๋ฒ•๋ก ์€ ์ƒˆ๋กœ์šด ๊ฐ•๊ฑด ๊ฐœ๋ณ„ ํ–‰๋™ ๊ฐ’ ์ •์˜๋ฅผ ํ†ตํ•ด ๋ถ„์‚ฐ ์‹คํ–‰๊ณผ ํ˜ธํ™˜๋˜๋ฉฐ ์‹œ์Šคํ…œ ์ „์ฒด์— ๋Œ€ํ•œ ์ฆ๋ช… ๊ฐ€๋Šฅํ•œ ๊ฐ•๊ฑด์„ฑ ๋ณด์žฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์‹ค์ œ ํ™˜๊ฒฝ ๋ถˆํ™•์‹ค์„ฑ์— ๋Œ€ํ•œ MARL์˜ ๊ฐ•๊ฑด์„ฑ ํ–ฅ์ƒ: ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฐ„์˜ ์ฐจ์ด, ๋ชจ๋ธ ๋ถˆ์ผ์น˜, ์‹œ์Šคํ…œ ๋…ธ์ด์ฆˆ ๋“ฑ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋ถˆํ™•์‹ค์„ฑ์— ๋Œ€ํ•ด MARL ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ๊ณผ ์•ˆ์ •์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ถ„์‚ฐ ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ๊ณผ ๊ฐ•๊ฑด์„ฑ ๋ณด์žฅ: ์ œ์•ˆ๋œ DrIGM ์›์น™์€ ๋ถ„์‚ฐ ํ™˜๊ฒฝ์—์„œ๋„ ๊ธฐ์กด์˜ ํƒ์š•์  ์‹คํ–‰ ๋ฐฉ์‹์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์‹œ์Šคํ…œ ์ „๋ฐ˜์˜ ๊ฐ•๊ฑด์„ฑ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด MARL ๋ฐฉ๋ฒ•๋ก ๊ณผ์˜ ํ˜ธํ™˜์„ฑ ๋ฐ ํ™•์žฅ์„ฑ: VDN, QMIX, QTRAN ๋“ฑ ๊ธฐ์กด์˜ ๊ฐ€์น˜ ๋ถ„ํ• (value-factorization) ์•„ํ‚คํ…์ฒ˜์— ์‰ฝ๊ฒŒ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ™•์žฅ์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๊ฐœ๋ณ„ ์—์ด์ „ํŠธ๋ณ„ ๋ณด์ƒ ์„ค๊ณ„ ์—†์ด ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ•œ๊ณ„์ : ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์˜ ์ด๋ก ์  ๊ฐ•๊ฑด์„ฑ ๋ณด์žฅ์ด ํŠน์ • ๊ฐ€์ • ํ•˜์— ์ด๋ฃจ์–ด์กŒ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋“  ์œ ํ˜•์˜ ๋ถˆํ™•์‹ค์„ฑ์— ๋Œ€ํ•ด ๋™์ผํ•œ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์žฅํ•˜์ง€๋Š” ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ๋‹ค์–‘ํ•œ ๋ณต์žกํ•œ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ์ถ”๊ฐ€์ ์ธ ๊ฒ€์ฆ ๋ฐ ๋ถˆํ™•์‹ค์„ฑ ๋ชจ๋ธ๋ง ๊ธฐ๋ฒ•๊ณผ์˜ ์œตํ•ฉ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘