Daily Arxiv

์ „ ์„ธ๊ณ„์—์„œ ๋ฐœ๊ฐ„๋˜๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ด€๋ จ ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•˜๋Š” ํŽ˜์ด์ง€ ์ž…๋‹ˆ๋‹ค.
๋ณธ ํŽ˜์ด์ง€๋Š” Google Gemini๋ฅผ ํ™œ์šฉํ•ด ์š”์•ฝ ์ •๋ฆฌํ•˜๋ฉฐ, ๋น„์˜๋ฆฌ๋กœ ์šด์˜ ๋ฉ๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ €์ž‘๊ถŒ์€ ์ €์ž ๋ฐ ํ•ด๋‹น ๊ธฐ๊ด€์— ์žˆ์œผ๋ฉฐ, ๊ณต์œ  ์‹œ ์ถœ์ฒ˜๋งŒ ๋ช…๊ธฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

AURA: A Diagnostic Framework for Tracking User Satisfaction of Interactive Planning Agents

Created by
  • Haebom
Category
Empty

์ €์ž

Takyoung Kim, Janvijay Singh, Shuhaib Mehri, Emre Can Acikgoz, Sagnik Mukherjee, Nimet Beyza Bozdag, Sumuk Shashidhar, Gokhan Tur, Dilek Hakkani-Tur

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM) ๊ธฐ๋ฐ˜ ๋Œ€ํ™”ํ˜• ๊ณ„ํš ์—์ด์ „ํŠธ์˜ ์‚ฌ์šฉ์ž ๋งŒ์กฑ๋„๋ฅผ ์ถ”์ ํ•˜๊ธฐ ์œ„ํ•œ ์ง„๋‹จ ํ”„๋ ˆ์ž„์›Œํฌ์ธ AURA๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. AURA๋Š” ์—์ด์ „ํŠธ์˜ ํ–‰๋™ ๋‹จ๊ณ„๋ฅผ ๊ฐœ๋…ํ™”ํ•˜๊ณ , LLM ๊ธฐ๋ฐ˜ ํ‰๊ฐ€ ๊ธฐ์ค€์„ ์‚ฌ์šฉํ•˜์—ฌ ์—์ด์ „ํŠธ์˜ ๊ฐ•์ ๊ณผ ์•ฝ์ ์„ ์ง„๋‹จํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋Š” ์ตœ์ข… ๊ฒฐ๊ณผ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ค‘๊ฐ„ ํ–‰๋™๋„ ์‚ฌ์šฉ์ž ๋งŒ์กฑ๋„์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์—์ด์ „ํŠธ์˜ ์ตœ์ข… ๋ชฉํ‘œ ๋‹ฌ์„ฑ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ „์ฒด์ ์ธ ์ƒํ˜ธ์ž‘์šฉ ๊ณผ์ •์ด ์‚ฌ์šฉ์ž ๋งŒ์กฑ๋„์— ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์‚ฌ์šฉ์ž ๋งŒ์กฑ๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์—์ด์ „ํŠธ ์„ค๊ณ„์˜ ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
AURA ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์—์ด์ „ํŠธ์˜ ํŠน์ • ํ–‰๋™ ๋‹จ๊ณ„๋ณ„ ์ง„๋‹จ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์—ฌ, ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ์ž๊ฐ€ ๊ฐœ์„ ์ ์„ ํŒŒ์•…ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ๋„์›€์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ–ฅํ›„ ์—ฐ๊ตฌ๋Š” ์—ฌ๋Ÿฌ ์—์ด์ „ํŠธ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ๊ณผ ํƒœ์Šคํฌ ๊ณ„ํš์—์„œ ์‚ฌ์šฉ์ž ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ํ•œ๊ณ„ ๊ทน๋ณต์— ์ดˆ์ ์„ ๋งž์ถœ ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘