Sign In

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Created by
  • Haebom
Category
Empty

์ €์ž

Zhehao Tan, Yihan Jiao, Dan Yang, Junjie Wang, Duolin Sun, Jie Feng, Xidong Wang, Lei Liu, Yue Shen, Jian Wang, Jinjie Gu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ(RAG) ๋ชจ๋ธ์—์„œ ๋งฅ๋ฝ ์ถฉ์‹ค๋„์™€ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ฐ•ํ™” ํ•™์Šต(RL) ๋ฐฉ๋ฒ•๋ก ์ธ CTRL-RAG๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด RAG RL ๋ฐฉ๋ฒ•์ด ์™ธ๋ถ€ ๋ณด์ƒ์— ์˜์กดํ•˜์—ฌ ๋ฌธ์„œ ์ถฉ์‹ค๋„ ํ‰๊ฐ€์— ์‹คํŒจํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, CTRL-RAG๋Š” ํ”„๋กฌํ”„ํŠธ์™€ ์ฆ๊ฑฐ๊ฐ€ ์žˆ๋Š” ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์‘๋‹ต์˜ ๋กœ๊ทธ ๊ฐ€๋Šฅ์„ฑ ์ฐจ์ด๋ฅผ ์ง์ ‘ ์ตœ์ ํ™”ํ•˜๋Š” ๋Œ€์กฐ์  ๊ฐ€๋Šฅ์„ฑ ๋ณด์ƒ(CLR)์„ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์€ ๊ด€๋ จ ์ฆ๊ฑฐ๋ฅผ ๋” ์ž˜ ์ถ”์ถœํ•˜๊ณ  ํŠน์ • ๋งฅ๋ฝ์— ๊ธฐ๋ฐ˜ํ•  ๋•Œ ์ž์‹ ๊ฐ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
RAG ๋ชจ๋ธ์˜ ๋งฅ๋ฝ ์ถฉ์‹ค๋„ ๋ฐ ์ถ”๋ก  ๋Šฅ๋ ฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋‚ด๋ถ€-์™ธ๋ถ€ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ณด์ƒ ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์‹œ.
โ€ข
๋Œ€์กฐ์  ๊ฐ€๋Šฅ์„ฑ ๋ณด์ƒ(CLR)์„ ํ†ตํ•ด ์™ธ๋ถ€ ๋ณด์ƒ ์—†์ด๋„ ๋ชจ๋ธ์˜ ์ž์ฒด์ ์ธ ๋งฅ๋ฝ ๊ธฐ๋ฐ˜ ์‘๋‹ต ์ƒ์„ฑ ๋Šฅ๋ ฅ ๊ฐ•ํ™”.
โ€ข
๋‹ค์–‘ํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ ์‹คํ—˜์ ์œผ๋กœ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์ž…์ฆํ•˜์—ฌ RAG ๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์ •ํ™•์„ฑ ํ–ฅ์ƒ์— ๊ธฐ์—ฌ.
โ€ข
CLR์ด ๋‚ด๋ถ€ ๋ณด์ƒ์œผ๋กœ ์ž‘์šฉํ•˜์ง€๋งŒ, ์žฅ๊ธฐ์ ์ธ ์„ฑ๋Šฅ ์•ˆ์ •์„ฑ ๋ฐ ์ž ์žฌ์  ๋ชจ๋ธ ๋ถ•๊ดด ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ์ถ”๊ฐ€์ ์ธ ์™ธ๋ถ€ ๋ณด์ƒ๊ณผ์˜ ์‹œ๋„ˆ์ง€ ๋˜๋Š” ์ž์ฒด์ ์ธ ๊ฐœ์„  ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Œ.
๐Ÿ‘