Sign In

Auditing Multi-Agent LLM Reasoning Trees Outperforms Majority Vote and LLM-as-Judge

Created by
  • Haebom
Category
Empty

์ €์ž

Wei Yang, Shixuan Li, Heng Ping, Peiyu Zhang, Paul Bogdan, Jesse Thomason

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ(MAS)์—์„œ LLM์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ™•์žฅํ•˜๋Š” ๋ฐ ์žˆ์–ด ๊ธฐ์กด์˜ ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ ๋ฐฉ์‹์ด ๊ฐ€์ง„ ํ•œ๊ณ„๋ฅผ ์ง€์ ํ•˜๋ฉฐ, ์—์ด์ „ํŠธ๋“ค์˜ ์ถ”๋ก  ๊ณผ์ •์„ ๋ช…์‹œ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” '์ถ”๋ก  ํŠธ๋ฆฌ'๋ฅผ ํ™œ์šฉํ•œ ์ƒˆ๋กœ์šด ๊ฐ์‚ฌ ๋ฐฉ๋ฒ•๋ก ์ธ 'AgentAuditor'๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. AgentAuditor๋Š” ์ถ”๋ก  ํŠธ๋ฆฌ์˜ ๋ถ„๊ธฐ์ ์„ ์ค‘์‹ฌ์œผ๋กœ ๊ตญ์†Œ์ ์ธ ๊ฒ€์ฆ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ธ€๋กœ๋ฒŒ ํŒ๋‹จ์„ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ๋‹ค์ˆ˜๊ฒฐ ์‹คํŒจ ์‚ฌ๋ก€๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ฆ๊ฑฐ ๊ธฐ๋ฐ˜์˜ ์†Œ์ˆ˜ ์˜๊ฒฌ์„ ์„ ํ˜ธํ•˜๋Š” ACPO ๊ธฐ๋ฒ•์„ ํ•จ๊ป˜ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
AgentAuditor๋Š” ๋‹ค์ค‘ ์—์ด์ „ํŠธ LLM ์‹œ์Šคํ…œ์—์„œ ์ถ”๋ก ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์ •ํ™•์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด์˜ ๋‹ค์ˆ˜๊ฒฐ ๋ฐฉ์‹์ด๋‚˜ LLM-as-Judge ๋ฐฉ์‹๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ํŠนํžˆ ์—์ด์ „ํŠธ ๊ฐ„์˜ ์ž˜๋ชป๋œ ํ•ฉ์˜(confabulation consensus) ๋ฌธ์ œ์— ๊ฐ•๊ฑดํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์–‘ํ•œ MAS ์„ค์ •์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์‹ค์ œ ์ ์šฉ ์‹œ 5%p ์ด์ƒ์˜ ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ถ”๋ก  ํŠธ๋ฆฌ์˜ ํ‘œํ˜„ ๋ฐ ํƒ์ƒ‰ ๊ณผ์ •์ด ๋ณต์žกํ•ด์งˆ ์ˆ˜ ์žˆ์–ด, ๋” ๋Œ€๊ทœ๋ชจ์˜ ๋ณต์žกํ•œ ์ถ”๋ก  ๊ณผ์ •์— ๋Œ€ํ•œ ํšจ์œจ์ ์ธ ์ฒ˜๋ฆฌ ๋ฐฉ์•ˆ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘