Sign In

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Created by
  • Haebom
Category
Empty

์ €์ž

Fanqi Kong, Jiayi Zhang, Mingyi Deng, Chenglin Wu, Yuyu Luo, Bang Liu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ LLM ์—์ด์ „ํŠธ๊ฐ€ ์‹ค์ œ ์‚ฌ์šฉ์ž ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ •๋ณด ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด InfoPO(Information-Driven Policy Optimization)๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. InfoPO๋Š” ๋‹ค์ค‘ ํ„ด ์ƒํ˜ธ์ž‘์šฉ์„ ๋ถˆํ™•์‹ค์„ฑ ๊ฐ์†Œ ๊ณผ์ •์œผ๋กœ ๋ณด๊ณ , ์ •๋ณด ํš๋“์œผ๋กœ ์ธํ•ด ์—์ด์ „ํŠธ์˜ ํ–‰๋™ ๋ถ„ํฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋ณ€ํ™”ํ•˜๋Š”์ง€์— ๊ธฐ๋ฐ˜ํ•œ ์ •๋ณด ์ด๋“ ๋ณด์ƒ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ž‘์—… ๊ฒฐ๊ณผ ๋ณด์ƒ๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ •๋ณด์˜ ์ค‘์š”์„ฑ์„ ํŒŒ์•…ํ•˜๊ณ  ์‚ฌ์šฉ์ž ์ค‘์‹ฌ์˜ ํ˜‘์—…์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์‚ฌ์šฉ์ž์˜ ๋ถˆ์™„์ „ํ•œ ์š”์ฒญ์— ํšจ๊ณผ์ ์œผ๋กœ ๋Œ€์‘ํ•˜๋Š” LLM ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ์˜ ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ ์ œ์‹œ
โ€ข
์ •๋ณด ํš๋“ ๊ณผ์ •์˜ ๊ฐ€์น˜๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜์—ฌ ๋” ํšจ์œจ์ ์ธ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ ํ•™์Šต ๊ฐ€๋Šฅ
โ€ข
์‚ฌ์šฉ์ž ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ณ€ํ™”๋‚˜ ์ƒˆ๋กœ์šด ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ๊ณผ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ์ž…์ฆ
โ€ข
์ •๋ณด ์ด๋“ ๋ณด์ƒ ๊ณ„์‚ฐ์˜ ๋ณต์žก์„ฑ ๋ฐ ์‹ค์ œ ์‚ฌ์šฉ์ž ์ƒํ˜ธ์ž‘์šฉ์—์„œ์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ ํ•„์š”
๐Ÿ‘