Sign In

When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning

Created by
  • Haebom
Category
Empty

์ €์ž

Vasilis Niarchos, Constantinos Papageorgakis, Alexander G. Stapleton, Sokratis Trifinopoulos

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” AI ์—ฐ๊ตฌ์›๊ณผ LLM ์—์ด์ „ํŠธ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์ด ์ด๋ก  ๋ฌผ๋ฆฌํ•™ ์ถ”๋ก  ๊ฒฐ๊ณผ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ SCALAR(Structured Critic--Actor Loop for AI Reasoning)๋ผ๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค. SCALAR๋Š” ์†”๋ฃจ์…˜์„ ์ œ์•ˆํ•˜๋Š” Actor, ๋ฐ˜๋ณต์ ์ธ ํ”ผ๋“œ๋ฐฑ์„ ์ œ๊ณตํ•˜๋Š” Critic, ๊ทธ๋ฆฌ๊ณ  ์ตœ์ข… ํ‰๊ฐ€๋ฅผ ๋‹ด๋‹นํ•˜๋Š” Judge๋กœ ๊ตฌ์„ฑ๋˜์–ด ์–‘์ž์žฅ๋ก  ๋ฐ ๋ˆ์ด๋ก  ๋ฌธ์ œ ํ•ด๊ฒฐ์— ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ Actor ํŽ˜๋ฅด์†Œ๋‚˜, Critic ํ”ผ๋“œ๋ฐฑ ์ „๋žต, Actor ๋ชจ๋ธ ํŒจ๋ฐ€๋ฆฌ ๋ฐ ์Šค์ผ€์ผ ๋ณ€ํ™”๋ฅผ ํ†ตํ•ด ๋‹ค์ค‘ ํ„ด ๋Œ€ํ™”๊ฐ€ ๋‹จ์ผ ์‹œ๋„๋ณด๋‹ค ๊ฐœ์„ ๋จ์„ ํ™•์ธํ–ˆ์œผ๋ฉฐ, ๊ฐœ์„  ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ํ”„๋กฌํ”„ํŠธ ์„ ํƒ์˜ ๊ฐ€์น˜๋Š” Actor-Critic ํŽ˜์–ด๋ง์— ํฌ๊ฒŒ ์˜์กดํ•จ์„ ๋ฐํ˜”์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
Actor์™€ Critic์˜ ์—ญํ•  ๋ถ„๋‹ด ๋ฐ ์ƒํ˜ธ์ž‘์šฉ ๊ตฌ์กฐ๊ฐ€ AI ๊ธฐ๋ฐ˜ ๊ณผํ•™ ๋ฐœ๊ฒฌ์˜ ํšจ์œจ์„ฑ์„ ๊ฒฐ์ •ํ•˜๋Š” ์ค‘์š”ํ•œ ์š”์†Œ์ž…๋‹ˆ๋‹ค.
โ€ข
๋น„๋Œ€์นญ์ ์ธ Actor-Critic ๊ตฌ์กฐ(๊ฐ€๋ฒผ์šด Actor์™€ ๊ฐ•๋ ฅํ•œ Critic)์—์„œ ๊ฑด์„ค์ ์ธ ํ”ผ๋“œ๋ฐฑ์ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.
โ€ข
๋ชจ๋ธ ์Šค์ผ€์ผ์—…์ด ์‰ฌ์šด ๋ฌธ์ œ์—๋Š” ๋„์›€์ด ๋˜๋‚˜, ๊ฐ€์žฅ ์–ด๋ ค์šด ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜์ง€๋Š” ๋ชปํ•˜๋ฉฐ, Actor-Critic ํŽ˜์–ด๋ง์— ๋”ฐ๋ผ ์ตœ์ ์˜ ํ”ผ๋“œ๋ฐฑ ์ „๋žต์ด ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” AI ๊ธฐ๋ฐ˜ ๊ณผํ•™ ๋ฐœ๊ฒฌ์„ ์œ„ํ•œ ์ƒํ˜ธ์ž‘์šฉ ๊ตฌ์กฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ํ†ต์ œ๋œ ํ…Œ์ŠคํŠธ๋ฒ ๋“œ๋ฅผ ์ œ๊ณตํ•˜์ง€๋งŒ, ํŠน์ • ์ด๋ก  ๋ฌผ๋ฆฌํ•™ ๋ถ„์•ผ์— ๊ตญํ•œ๋œ ๊ฒฐ๊ณผ์ด๋ฏ€๋กœ ์ผ๋ฐ˜ํ™”์—๋Š” ์ฃผ์˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘