Sign In

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Tara Bogavelli, Gabrielle Gauthier Melan\c{c}on, Katrina Stankiewicz, Oluwanifemi Bamgbose, Fanny Riols, Hoang H. Nguyen, Raghav Mehndiratta, Lindsay Devon Brin, Joseph Marinier, Hari Subramani, Anil Madamala, Sridhar Krishna Nemala, Srinivas Sunkara

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์—… ํ™˜๊ฒฝ์—์„œ ํ™œ์šฉ๋˜๋Š” ์Œ์„ฑ ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ข…๋‹จ๊ฐ„(end-to-end) ํ”„๋ ˆ์ž„์›Œํฌ์ธ EVA-Bench๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. EVA-Bench๋Š” ์‹ค์ œ์™€ ์œ ์‚ฌํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋Œ€ํ™”๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์Œ์„ฑ ์—์ด์ „ํŠธ ๊ณ ์œ ์˜ ๋‹ค์–‘ํ•œ ์‹คํŒจ ๋ชจ๋“œ๋ฅผ ํฌ๊ด„์ ์œผ๋กœ ์ธก์ •ํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ๊ณผ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ EVA-A (์ •ํ™•๋„) ๋ฐ EVA-X (๊ฒฝํ—˜) ๋ฉ”ํŠธ๋ฆญ์€ ์ž‘์—… ์™„๋ฃŒ, ์Œ์„ฑ ์ถฉ์‹ค๋„, ๋Œ€ํ™” ํ๋ฆ„, ๋ฐœํ™” ๊ฐ„ ๊ฐ„๊ฒฉ ๋“ฑ ๋‹ค์–‘ํ•œ ์ธก๋ฉด์„ ํ‰๊ฐ€ํ•˜์—ฌ ์‹œ์Šคํ…œ ๊ฐ„ ์ง์ ‘์ ์ธ ๋น„๊ต๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์Œ์„ฑ ์—์ด์ „ํŠธ์˜ ์ •ํ™•๋„(EVA-A)์™€ ์‚ฌ์šฉ์ž ๊ฒฝํ—˜(EVA-X) ์ธก๋ฉด ๋ชจ๋‘์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๋‹จ์ผ ์‹œ์Šคํ…œ์€ ์•„์ง ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์ด๋Š” ๊ธฐ์กด ์‹œ์Šคํ…œ์˜ ์ข…ํ•ฉ์ ์ธ ์„ฑ๋Šฅ ๋ถ€์กฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ”ผํฌ ์„ฑ๋Šฅ(pass@k)๊ณผ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์„ฑ๋Šฅ(pass^k) ๊ฐ„์˜ ์ƒ๋‹นํ•œ ๊ฒฉ์ฐจ๋Š” ์Œ์„ฑ ์—์ด์ „ํŠธ์˜ ์•ˆ์ •์„ฑ๊ณผ ์ผ๊ด€์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ๊ฐœ์„  ํ•„์š”์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
๋‹ค์–‘ํ•œ ์•…์„ผํŠธ์™€ ์žก์Œ ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋Š” ์Œ์„ฑ ์—์ด์ „ํŠธ์˜ ๊ฒฌ๊ณ ์„ฑ(robustness) ๋ถ€์กฑ์„ ๋“œ๋Ÿฌ๋‚ด๋ฉฐ, ์ด๋Š” ์‹ค์ œ ์‚ฌ์šฉ ํ™˜๊ฒฝ์—์„œ์˜ ์ ์šฉ์— ์žˆ์–ด ์ค‘์š”ํ•œ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆ๋œ EVA-Bench ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋ฐ์ดํ„ฐ์…‹์„ ๊ณต๊ฐœํ•˜์—ฌ ํ–ฅํ›„ ์Œ์„ฑ ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ๋ฐ ๊ฐœ๋ฐœ์— ๊ธฐ์—ฌํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘