Sign In

Deep Double Q-learning

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Prabhat Nagarajan, Martha White, Marlos C. Machado

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ Q-learning์˜ ๊ณผ๋Œ€ํ‰๊ฐ€ ํŽธํ–ฅ์„ ์™„ํ™”ํ•˜๋Š” Double Q-learning์„ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต์— ์ ์šฉํ•œ Deep Double Q-learning(DDQL)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. DDQL์€ ๋‘ ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ Q ํ•จ์ˆ˜๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ณผ๋Œ€ํ‰๊ฐ€ ํŽธํ–ฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ค„์ž…๋‹ˆ๋‹ค. Atari 2600 ๊ฒŒ์ž„ 57๊ฐœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ, DDQL์€ Double DQN ๋Œ€๋น„ ์ „๋ฐ˜์ ์ธ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ 47๊ฐœ ๊ฒŒ์ž„์—์„œ ๋” ๋‚˜์€ ์„ฑ๊ณผ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
DDQL์€ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต์—์„œ Q ํ•จ์ˆ˜ ์ถ”์ •์˜ ๊ณผ๋Œ€ํ‰๊ฐ€ ํŽธํ–ฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ•˜์—ฌ ํ•™์Šต ์•ˆ์ •์„ฑ์„ ๋†’์ด๊ณ  ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋‘ ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ Q ํ•จ์ˆ˜ ํ•™์Šต๊ณผ ์ถ”๊ฐ€์ ์ธ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•(๋‚ฎ์€ ๋ฆฌํ”Œ๋ ˆ์ด ๋น„์œจ, ๊ธด ํƒ€๊ฒŸ ๋„คํŠธ์›Œํฌ ์—…๋ฐ์ดํŠธ ๊ฐ„๊ฒฉ, ๊ณต์œ  ๋ ˆ์ด์–ด)์˜ ์กฐํ•ฉ์ด DDQL์˜ ํ•ต์‹ฌ์ ์ธ ์žฅ์ ์ž…๋‹ˆ๋‹ค.
โ€ข
๋…ผ๋ฌธ์€ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต์— Double Q-learning์„ ์ ์šฉํ•  ๋•Œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ, ๋ฆฌํ”Œ๋ ˆ์ด ๋น„์œจ, ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ์ƒ˜ํ”Œ๋ง ์ „๋žต ๋“ฑ ์ฃผ์š” ์„ค๊ณ„ ์„ ํƒ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘