Sign In

CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

Created by
  • Haebom
Category
Empty

์ €์ž

Yiyue Qian, Shinan Zhang, Yun Zhou, Haibo Ding, Diego Socolinsky, Yi Zhang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋‹จ์ผ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ์‹ฌํŒ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์˜ ์ผ๊ด€์„ฑ ๋ถ€์กฑ ๋ฐ ํŽธํ–ฅ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ˜‘์—… ๊ธฐ๋ฐ˜์˜ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ์ธ CollabEval์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. CollabEval์€ ์ดˆ๊ธฐ ํ‰๊ฐ€, ๋‹ค์ค‘ ๋ผ์šด๋“œ ํ† ๋ก , ์ตœ์ข… ํŒ๋‹จ์˜ ์„ธ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์ณ ์ „๋žต์  ํ•ฉ์˜ ํ™•์ธ์„ ํ†ตํ•ด ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, CollabEval์€ ๊ธฐ์กด ๋‹จ์ผ LLM ํ‰๊ฐ€ ๋ฐฉ์‹๋ณด๋‹ค ์—ฌ๋Ÿฌ ์ฐจ์›์—์„œ ์šฐ์ˆ˜ํ•˜๋ฉฐ, ๊ฐœ๋ณ„ ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ๋•Œ๋„ ๊ฒฌ๊ณ ํ•œ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์„ ์‹ฌํŒ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ํŒจ๋Ÿฌ๋‹ค์ž„์—์„œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ˜‘์—…์„ ํ†ตํ•ด ํ‰๊ฐ€์˜ ์ •ํ™•์„ฑ๊ณผ ์ผ๊ด€์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
๊ฒฝ์Ÿ์  ํ† ๋ก ์ด๋‚˜ ๋‹จ์ผ ๋ชจ๋ธ ํ‰๊ฐ€ ๋ฐฉ์‹์—์„œ ๋ฒ—์–ด๋‚˜ ํ˜‘์—… ๋ฐ ํ•ฉ์˜ ๊ณผ์ •์„ ํ†ตํ•ด ํšจ์œจ์ ์ด๋ฉด์„œ๋„ ์‹ ๋ขฐ๋„ ๋†’์€ ํ‰๊ฐ€๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Œ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋‹ค์–‘ํ•œ ํ‰๊ฐ€ ๊ธฐ์ค€์— ๋Œ€ํ•œ ํฌ๊ด„์ ์ธ ์ง€์›๊ณผ ํ˜‘์—… ์„ค๊ณ„๋ฅผ ํ†ตํ•œ ํšจ์œจ์„ฑ ํ™•๋ณด๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
โ€ข
CollabEval์˜ ํ™•์žฅ์„ฑ ๋ฐ ํŠน์ • ๋„๋ฉ”์ธ์—์„œ์˜ ์„ฑ๋Šฅ ์ตœ์ ํ™”, ๊ทธ๋ฆฌ๊ณ  ์—์ด์ „ํŠธ ๊ฐ„์˜ ๊ฐˆ๋“ฑ ํ•ด๊ฒฐ ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘