Sign In

Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

Created by
  • Haebom
Category
Empty

์ €์ž

Jiaye Lin, Mengdi Li, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ AI ํ”ผ๋“œ๋ฐฑ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต(RLAIF)์œผ๋กœ ํ•™์Šต๋œ ๋ณด์ƒ ๋ชจ๋ธ์˜ ๋‚ฎ์€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋‚œ์ด๋„์— ๋”ฐ๋ฅธ ์ปค๋ฆฌํ˜๋Ÿผ ํ•™์Šต์„ ํ†ตํ•ด ๋ณด์ƒ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์ •์ฑ… ๋ชจ๋ธ์˜ ์ •๋ ฌ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•˜๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ์ธ Curriculum-RLAIF๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋ณ„๋„์˜ ์ถ”๋ก  ๋น„์šฉ ์ฆ๊ฐ€ ์—†์ด ๊ธฐ์กด ๊ธฐ๋ฒ• ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
RLAIF ๋ณด์ƒ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ๋ฐ์ดํ„ฐ ๋‚œ์ด๋„ ๊ธฐ๋ฐ˜ ์ปค๋ฆฌํ˜๋Ÿผ ํ•™์Šต์œผ๋กœ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ Curriculum-RLAIF ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก  ๋Œ€๋น„ ๊ฐ„๋‹จํ•˜๋ฉด์„œ๋„ ํšจ์œจ์ ์ด๊ณ  ํšจ๊ณผ์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹ ๋ฐ ํƒœ์Šคํฌ์— ๋Œ€ํ•œ Curriculum-RLAIF์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์žฅํ•˜๊ณ , ์ตœ์ ์˜ ์ปค๋ฆฌํ˜๋Ÿผ ์ƒ์„ฑ ์ „๋žต์„ ํƒ์ƒ‰ํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘