Sign In

Language Model Goal Selection Differs from Humans' in a Self-Directed Learning Task

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Gaia Molinaro, Dave August, Danielle Perszyk, Anne G. E. Collins

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์ด ์ž๊ธฐ ์ฃผ๋„ ํ•™์Šต ๊ณผ์ œ์—์„œ ์ธ๊ฐ„์˜ ๋ชฉํ‘œ ์„ค์ • ์„ ํ˜ธ๋„๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋ฐ˜์˜ํ•˜๋Š”์ง€ ์กฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์„ฏ ๊ฐ€์ง€ LLM์„ ๋Œ€์ƒ์œผ๋กœ ์‹คํ—˜ํ•œ ๊ฒฐ๊ณผ, ์ธ๊ฐ„๊ณผ LLM์˜ ๋ชฉํ‘œ ์„ ํƒ ๋ฐฉ์‹์— ์ƒ๋‹นํ•œ ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ธ๊ฐ„์€ ๋‹ค์–‘ํ•˜๊ฒŒ ๋ชฉํ‘œ๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐ˜๋ฉด, ๋Œ€๋ถ€๋ถ„์˜ LLM์€ ๋‹จ์ผ ํ•ด๊ฒฐ์ฑ…์„ ํƒ์ƒ‰ํ•˜๊ฑฐ๋‚˜ ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
ํ˜„์žฌ LLM์€ ์ธ๊ฐ„์˜ ๋ชฉํ‘œ ์„ค์ • ์„ ํ˜ธ๋„๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜๋ฏ€๋กœ, ์ธ๊ฐ„์„ ๋Œ€์ฒดํ•˜๋Š” ๋ฐ ์‹ ์ค‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
LLM์˜ ๋ชฉํ‘œ ์„ ํƒ ํ–‰๋™์€ ๋ชจ๋ธ๋ณ„๋กœ ๋šœ๋ ทํ•œ ํŒจํ„ด์„ ๋ณด์ด๋ฉฐ, ์ธ๊ฐ„์˜ ๊ฐœ๋ณ„์ ์ธ ๋‹ค์–‘์„ฑ๊ณผ๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
Chain-of-thought ์ถ”๋ก ์ด๋‚˜ ํŽ˜๋ฅด์†Œ๋‚˜ ์กฐ์ž‘๊ณผ ๊ฐ™์€ ๊ธฐ๋ฒ•๋„ LLM๊ณผ ์ธ๊ฐ„ ๊ฐ„์˜ ๋ชฉํ‘œ ์„ ํƒ ๊ฒฉ์ฐจ๋ฅผ ํฌ๊ฒŒ ์ค„์ด์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋Š” ์‹ค์ œ ์‘์šฉ ํ™˜๊ฒฝ์—์„œ์˜ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋ฉฐ, ์ธ๊ฐ„์˜ ๋ชฉํ‘œ ์„ ํƒ ๊ณผ์ •์ด ๊ฐ€์ง„ ๊ณ ์œ ์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘