Sign In

Trajectory Supervision for Continual Tool-Use Learning in LLMs

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Vishnu Vardhan Reddy, Sagnik Chatterjee, Soumik Bhatta

๐Ÿ’ก ๊ฐœ์š”

์ด ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์ด ์ƒˆ๋กœ์šด API ๋„๋ฉ”์ธ์„ ์ง€์†์ ์œผ๋กœ ํ•™์Šตํ•  ๋•Œ, ์ค‘๊ฐ„ API ํ˜ธ์ถœ ๊ณผ์ •์„ ํฌํ•จํ•˜๋Š” '๋„๊ตฌ ์‚ฌ์šฉ ๊ถค์ (tool-use trajectory)'์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ์œ ์šฉํ•œ์ง€ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ API ํ˜ธ์ถœ ๋ฐ ์‘๋‹ต ๊ธฐ๋ก์„ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ์‹(Condition A)๊ณผ ๋‹ฌ๋ฆฌ, ๊ถค์  ์ •๋ณด๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋ฐฉ์‹(Condition B)์œผ๋กœ Llama 3.1 8B Instruct ๋ชจ๋ธ์„ API-Bank ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๊ถค์  ์ •๋ณด๋ฅผ ์œ ์ง€ํ•œ Condition B๊ฐ€ ์ด์ „ ์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•œ Condition A๋ณด๋‹ค ์ตœ์ข… API ํ˜ธ์ถœ์˜ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์˜ ์ง€์†์ ์ธ ๋„๊ตฌ ํ•™์Šต ์‹œ, ์ค‘๊ฐ„ API ํ˜ธ์ถœ ๋ฐ ์‘๋‹ต ๊ณผ์ •์„ ํฌํ•จํ•˜๋Š” '๋„๊ตฌ ์‚ฌ์šฉ ๊ถค์ ' ์ •๋ณด๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ธ์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๊ถค์  ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ด์ „ API ์‚ฌ์šฉ ๊ธฐ๋ก์„ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ์‹๋ณด๋‹ค ๋” ๋‚˜์€ API ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ด๋Š” LLM์ด ๋ณต์žกํ•œ API ์ƒํ˜ธ์ž‘์šฉ์„ ๋” ์ž˜ ์ดํ•ดํ•˜๊ณ  ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ๋‹จ์ผ ์‹œ๋“œ๋ฅผ ์‚ฌ์šฉํ•œ ํŒŒ์ผ๋Ÿฟ ์—ฐ๊ตฌ์ด๋ฉฐ, ๊ถค์  ์ •๋ณด๋ฅผ ์œ ์ง€ํ•  ๊ฒฝ์šฐ ํ›ˆ๋ จ ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•˜๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ตœ์ข… ๋ชฉํ‘œ๊ฐ€ ์ „์ฒด ๋Œ€ํ™” ์„ฑ๊ณต์ด ์•„๋‹Œ ๋‹ค์Œ API ํ˜ธ์ถœ ์˜ˆ์ธก์— ๊ตญํ•œ๋˜์–ด ์žˆ์–ด, ์‹ค์ œ์ ์ธ ๋Œ€ํ™” ์„ฑ๊ณต์œผ๋กœ ์ด์–ด์ง€๋Š”์ง€์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ๊ฒ€์ฆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘