Sign In

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

Created by
  • Haebom
Category
Empty

์ €์ž

Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ AI ์‹œ์Šคํ…œ์˜ ์œ„ํ—˜ํ•œ emergent capability๋กœ ์—ฌ๊ฒจ์ง€๋Š” '์ƒํ™ฉ ์ธ์‹(situational awareness)' ๋Šฅ๋ ฅ์ด ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ๋…ผ๋ฆฌ์  ์ถ”๋ก  ๋Šฅ๋ ฅ ํ–ฅ์ƒ๊ณผ ํ•„์—ฐ์ ์œผ๋กœ ์ถฉ๋Œํ•  ๊ฒƒ์ด๋ผ๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ RAISE ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ์—ฐ์—ญ์  ์ž๊ธฐ ์ถ”๋ก , ๊ท€๋‚ฉ์  ๋งฅ๋ฝ ์ธ์‹, ๊ฐ€์ถ”์  ์ž๊ธฐ ๋ชจ๋ธ๋ง์ด๋ผ๋Š” ์„ธ ๊ฐ€์ง€ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๋…ผ๋ฆฌ์  ์ถ”๋ก  ๋Šฅ๋ ฅ์ด ์ƒํ™ฉ ์ธ์‹์˜ ์‹ฌ์˜คํ•œ ์ˆ˜์ค€์œผ๋กœ ๋ฐœ์ „ํ•  ์ˆ˜ ์žˆ์Œ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๋…ผ๋ฆฌ์  ์ถ”๋ก  ๋Šฅ๋ ฅ์˜ ๋ฐœ์ „์€ AI์˜ ์ž๊ธฐ ์ธ์‹ ๋ฐ ์ „๋žต์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์‹ฌํ™”์‹œ์ผœ, ์˜๋„์น˜ ์•Š์€ ๋ณต์žกํ•œ ์ƒํ™ฉ ์ธ์‹์œผ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
LLM ๋…ผ๋ฆฌ์  ์ถ”๋ก  ๋ถ„์•ผ์˜ ํ˜„์žฌ ์—ฐ๊ตฌ ์ฃผ์ œ๋“ค์ด ์ƒํ™ฉ ์ธ์‹ ๋Šฅ๋ ฅ์˜ ์ฆํญ๊ธฐ๋กœ ์ง์ ‘์ ์œผ๋กœ ์ž‘์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ธฐ์กด ์•ˆ์ „ ์กฐ์น˜๋Š” ์ด๋Ÿฌํ•œ ํ™•๋Œ€๋ฅผ ๋ง‰๊ธฐ์— ๋ถˆ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
"๊ฑฐ์šธ ํ…Œ์ŠคํŠธ(Mirror Test)"์™€ ๊ฐ™์€ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ ๋ฐ "์ถ”๋ก  ์•ˆ์ „์„ฑ ๋™๋“ฑ์„ฑ ์›์น™(Reasoning Safety Parity Principle)"๊ณผ ๊ฐ™์€ ๊ตฌ์ฒด์ ์ธ ์•ˆ์ „ ์žฅ์น˜ ๋งˆ๋ จ์„ ์ œ์•ˆํ•˜๋ฉฐ, ๋…ผ๋ฆฌ์  ์ถ”๋ก  ์—ฐ๊ตฌ ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ์ฑ…์ž„๊ฐ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ์ œ์•ˆ๋œ ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์‹ค์ œ LLM์— ์–ด๋–ป๊ฒŒ ๊ตฌํ˜„๋˜๊ณ  ํ‰๊ฐ€๋ ์ง€์— ๋Œ€ํ•œ ๊ตฌ์ฒด์ ์ธ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์•„์ง ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘