Sign In

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

Created by
  • Haebom
Category
Empty

์ €์ž

Jingqi Zhou, Sheng Wang, DeZhao Deng, Junwen Lu, Junwei Su, Qintong Li, Jiahui Gao, Hao Wu, Jiyue Jiang, Lingpeng Kong, Chuan Wu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ๊ฐ€ ์ •์ ์ธ ๊ตฌ์„ฑ์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ๋ณต์žกํ•œ ์žฅ๊ธฐ ๊ณผ์ œ ์ˆ˜ํ–‰์˜ ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด 'ToolSelf'๋ผ๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ToolSelf๋Š” ๊ตฌ์„ฑ ์—…๋ฐ์ดํŠธ๋ฅผ ๋„๊ตฌ๋กœ ์ถ”์ƒํ™”ํ•˜์—ฌ ์—์ด์ „ํŠธ๊ฐ€ ์‹คํ–‰ ์ค‘์— ์Šค์Šค๋กœ๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋„๋ก ํ•จ์œผ๋กœ์จ, ์™ธ๋ถ€ ๊ทœ์น™์—์„œ ๋‚ด๋ถ€ ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ์˜ ์ „ํ™˜์„ ์ด๋ฃจ์–ด๋ƒ…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์—์ด์ „ํŠธ๋Š” ๊ณผ์ œ ์ง„ํ–‰์— ๋”ฐ๋ผ ํ•˜์œ„ ๋ชฉํ‘œ์™€ ๋งฅ๋ฝ์„ ์ž์œจ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๊ณ , ์ „๋žต ๋ฐ ๋„๊ตฌ ์ƒ์ž๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๊ณผ์ œ ์ˆ˜ํ–‰๊ณผ ์ž๊ธฐ ๊ด€๋ฆฌ์˜ ์ด์ค‘ ๊ด€๋ฆฌ์ž๋กœ ๋ณ€๋ชจํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๋™์  ์ž๊ธฐ ์ ์‘ ์—์ด์ „ํŠธ ๊ฐ€๋Šฅ์„ฑ ์ œ์‹œ: ToolSelf๋Š” ์—์ด์ „ํŠธ๊ฐ€ ๋‹จ์ˆœํžˆ ์ฃผ์–ด์ง„ ๊ณผ์ œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด, ๊ณผ์ œ ์ˆ˜ํ–‰ ๊ณผ์ •์—์„œ ์Šค์Šค๋กœ์˜ ๊ตฌ์„ฑ๊ณผ ์ „๋žต์„ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๋ถ€์—ฌํ•˜์—ฌ ์ง„์ •ํ•œ ์˜๋ฏธ์˜ ์ž๊ธฐ ์ ์‘ ์—์ด์ „ํŠธ ์‹œ๋Œ€๋ฅผ ์—ด ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ์˜ ํ•œ๊ณ„ ๊ทน๋ณต: ์ˆ˜๋™ ์กฐ์ •์ด๋‚˜ ํœด๋ฆฌ์Šคํ‹ฑ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฐ ํŒŒํŽธํ™”๋œ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ , ๋‹จ์ผ ์•ก์…˜ ๊ณต๊ฐ„์—์„œ ๊ณผ์ œ ์‹คํ–‰๊ณผ ์ž๊ธฐ ์กฐ์ •์„ ํ†ตํ•ฉํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
โ€ข
์ƒˆ๋กœ์šด ํ›ˆ๋ จ ๋ฐฉ๋ฒ•๋ก  (CAT) ์ œ์•ˆ: Rejection sampling fine-tuning๊ณผ trajectory-level reinforcement learning์„ ๊ฒฐํ•ฉํ•œ Configuration-Aware Two-stage Training (CAT) ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ฉ”ํƒ€ ๋Šฅ๋ ฅ์„ ๋‚ด์žฌํ™”ํ•˜๋Š” ๋ฐฉ์‹์„ ๊ตฌ์ฒด์ ์œผ๋กœ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ•œ๊ณ„์  ๋˜๋Š” ํ–ฅํ›„ ๊ณผ์ œ: ์‹ค์ œ ๋ณต์žกํ•˜๊ณ  ์˜ˆ์ธก ๋ถˆ๊ฐ€๋Šฅํ•œ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ToolSelf์˜ ์„ฑ๋Šฅ ๊ฒ€์ฆ ๋ฐ ํ™•์žฅ, ๊ทธ๋ฆฌ๊ณ  ์—์ด์ „ํŠธ์˜ ์ž๊ธฐ ์žฌ๊ตฌ์„ฑ ๊ณผ์ •์—์„œ์˜ ์ž ์žฌ์ ์ธ ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ ๋ฐ ์ด๋ฅผ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘