Sign In

MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning

Created by
  • Haebom
Category
Empty

์ €์ž

Tianmeng Hu, Biao Luo, Chunhua Yang, Tingwen Huang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์—ฌ๋Ÿฌ ๋ชฉํ‘œ๋ฅผ ๋™์‹œ์— ๋งŒ์กฑ์‹œํ‚ค๋ฉด์„œ ํ˜‘๋ ฅํ•ด์•ผ ํ•˜๋Š” ๋ณต์žกํ•œ ๋ฌธ์ œ(๋‹ค์ค‘ ๋ชฉํ‘œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ˜‘๋ ฅ ์˜์‚ฌ๊ฒฐ์ •)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์ธ MO-MIX๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. MO-MIX๋Š” ์ค‘์•™ ์ง‘์ค‘์‹ ํ•™์Šต ๋ฐ ๋ถ„์‚ฐ ์‹คํ–‰(CTDE) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ์—์ด์ „ํŠธ ๋„คํŠธ์›Œํฌ์— ๋ชฉํ‘œ๋ณ„ ์„ ํ˜ธ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ€์ค‘์น˜ ๋ฒกํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ์ง€์—ญ์  ํ–‰๋™-๊ฐ€์น˜ ํ•จ์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๊ณ , ๋ณ‘๋ ฌ ๊ตฌ์กฐ์˜ ๋ฏน์‹ฑ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ์ „์ฒด ํ–‰๋™-๊ฐ€์น˜ ํ•จ์ˆ˜๋ฅผ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ํƒ์ƒ‰ ๊ฐ€์ด๋“œ ๋ฐฉ์‹์„ ๋„์ž…ํ•˜์—ฌ ์ตœ์ข… ๋น„์ง€๋ฐฐ ํ•ด ์ง‘ํ•ฉ์˜ ๊ท ์ผ์„ฑ์„ ๋†’์ด๊ณ , ์‹คํ—˜์„ ํ†ตํ•ด MO-MIX๊ฐ€ ํŒŒ๋ ˆํ†  ์ง‘ํ•ฉ์˜ ๊ทผ์‚ฌ์น˜๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋ฉฐ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ๊ณผ ๋‚ฎ์€ ๊ณ„์‚ฐ ๋น„์šฉ์„ ๋ณด์ž„์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๋ณต์ˆ˜์˜ ์—์ด์ „ํŠธ๊ฐ€ ์ƒ์ถฉ๋˜๋Š” ์—ฌ๋Ÿฌ ๋ชฉํ‘œ๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•ด์•ผ ํ•˜๋Š” ์‹ค์ œ ๋ฌธ์ œ์— ๋Œ€ํ•œ ํšจ๊ณผ์ ์ธ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ชฉํ‘œ ์„ ํ˜ธ๋„ ๊ฐ€์ค‘์น˜ ๋ฒกํ„ฐ๋ฅผ ํ†ตํ•ด ๊ฐ ์—์ด์ „ํŠธ๊ฐ€ ํŠน์ • ๋ชฉํ‘œ์— ๋” ์ง‘์ค‘ํ•˜๋„๋ก ์œ ๋„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•œ ์œ ์—ฐํ•œ ์ ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํŒŒ๋ ˆํ†  ์ง‘ํ•ฉ์˜ ๊ทผ์‚ฌ์น˜๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ท ํ˜•์ ์„ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์–ด์ค๋‹ˆ๋‹ค.
โ€ข
ํƒ์ƒ‰ ๊ฐ€์ด๋“œ ๋ฐฉ์‹์˜ ํšจ์œจ์„ฑ๊ณผ ํ™•์žฅ์„ฑ, ๊ทธ๋ฆฌ๊ณ  ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋น„์ •์ƒ์ ์ธ ์ƒํ™ฉ์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ ๊ฐœ์„ ์ด ํ–ฅํ›„ ๊ณผ์ œ๋กœ ๋‚จ์•„์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘