Sign In

When Chains of Thought Don't Matter: Causal Bypass in Large Language Models

Created by
  • Haebom
Category
Empty

์ €์ž

Anish Sathyanarayanan, Aditya Nagarsekar, Aarush Rathore

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์‚ฌ๊ณ  ์‚ฌ์Šฌ(Chain-of-Thought, CoT) ํ”„๋กฌํ”„ํŒ…์ด ๋ชจ๋ธ์˜ ์ถ”๋ก  ๊ณผ์ •์„ ์‹ค์ œ๋กœ ๋ฐ˜์˜ํ•˜๊ณ  ํˆฌ๋ช…์„ฑ์„ ๋ณด์žฅํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ๊ทผ๋ณธ์ ์ธ ์˜๋ฌธ์„ ์ œ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ ๊ฒฐ๊ณผ, CoT๊ฐ€ ํ”ผ์ƒ์ ์œผ๋กœ๋Š” ์ถฉ์‹คํ•ด ๋ณด์—ฌ๋„ ๋ชจ๋ธ์˜ ๋‹ต๋ณ€์ด CoT ๋‚ด์šฉ๊ณผ ์ธ๊ณผ์ ์œผ๋กœ ๋…๋ฆฝ์ ์ผ ์ˆ˜ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์œผ๋ฉฐ, ์ด๋Š” CoT์˜ ํšจ์šฉ์„ฑ์— ๋Œ€ํ•œ ์ค‘์š”ํ•œ ๋ฐ˜๋ก ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
CoT ํ”„๋กฌํ”„ํŒ…์€ ๋ชจ๋ธ์˜ ์ถ”๋ก  ๊ณผ์ •์„ ํˆฌ๋ช…ํ•˜๊ฒŒ ๋“œ๋Ÿฌ๋‚ธ๋‹ค๋Š” ๊ธฐ์กด์˜ ๊ฐ€์ •์— ๋„์ „ํ•˜๋ฉฐ, LLM์˜ ์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์žฌํ‰๊ฐ€๋ฅผ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
CoT์˜ ํ‘œ๋ฉด์ ์ธ ์ถฉ์‹ค๋„๋งŒ์œผ๋กœ๋Š” ๋ชจ๋ธ์ด ์‹ค์ œ๋กœ ํ•ด๋‹น ๊ทผ๊ฑฐ์— ์˜์กดํ•˜์—ฌ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋Š”์ง€ ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ, '์šฐํšŒ ํšŒ๋กœ(bypass circuit)'์˜ ์กด์žฌ ๊ฐ€๋Šฅ์„ฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์‹œ๋œ ์ง„๋‹จ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” CoT์˜ ์กฐ์ž‘ ์‹ ํ˜ธ๋ฅผ ํƒ์ง€ํ•˜๊ณ  ์ธ๊ณผ์  ์˜ํ–ฅ๋ ฅ์„ ์ธก์ •ํ•˜๋Š” ์œ ์šฉํ•œ ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•˜์ง€๋งŒ, ํŠน์ • ๊ณผ์ œ์—์„œ๋Š” CoT์˜ ์˜ํ–ฅ๋ ฅ์ด ๋ฏธ๋ฏธํ•˜๊ฑฐ๋‚˜ ๊ฑฐ์˜ ์—†๋Š” '์™„์ „ ์šฐํšŒ' ํ˜„์ƒ์ด ๊ด€์ฐฐ๋˜์–ด LLM์˜ ์ถ”๋ก  ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•œ ๋” ๊นŠ์€ ์ดํ•ด๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘