Daily Arxiv

์ „ ์„ธ๊ณ„์—์„œ ๋ฐœ๊ฐ„๋˜๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ด€๋ จ ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•˜๋Š” ํŽ˜์ด์ง€ ์ž…๋‹ˆ๋‹ค.
๋ณธ ํŽ˜์ด์ง€๋Š” Google Gemini๋ฅผ ํ™œ์šฉํ•ด ์š”์•ฝ ์ •๋ฆฌํ•˜๋ฉฐ, ๋น„์˜๋ฆฌ๋กœ ์šด์˜ ๋ฉ๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ €์ž‘๊ถŒ์€ ์ €์ž ๋ฐ ํ•ด๋‹น ๊ธฐ๊ด€์— ์žˆ์œผ๋ฉฐ, ๊ณต์œ  ์‹œ ์ถœ์ฒ˜๋งŒ ๋ช…๊ธฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

Created by
  • Haebom
Category
Empty

์ €์ž

Laya Iyer, Kriti Aggarwal, Sanmi Koyejo, Gail Heyman, Desmond C. Ong, Subhabrata Mukherjee

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ธ๊ฐ„๊ณผ LLM์ด ๊ฐ์„ฑ์  ์ง€์ง€ ๋Œ€ํ™”์—์„œ ๋ณด์ด๋Š” ๋Šฅ๋ ฅ์„ ๋น„๊ต ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์ดˆ์˜ ํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ์ธ HEART๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. HEART๋Š” 5๊ฐ€์ง€ ์ฐจ์›์˜ ํ‰๊ฐ€ ๋ฃจ๋ธŒ๋ฆญ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ธ๊ฐ„ ํ‰๊ฐ€์ž์™€ LLM ํ‰๊ฐ€์ž๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ž๊ฐ„ ๋Œ€ํ™”์—์„œ ์ธ๊ฐ„๊ณผ LLM์˜ ์‘๋‹ต์„ ์ง์ ‘ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ผ๋ถ€ ์ตœ์ฒจ๋‹จ LLM์€ ๊ณต๊ฐ ๋ฐ ์ผ๊ด€์„ฑ ์ธก๋ฉด์—์„œ ํ‰๊ท ์ ์ธ ์ธ๊ฐ„ ์‘๋‹ต์— ๊ทผ์ ‘ํ•˜๊ฑฐ๋‚˜ ์ด๋ฅผ ๋Šฅ๊ฐ€ํ–ˆ์ง€๋งŒ, ์ธ๊ฐ„์€ ์ ์‘์  ์žฌ๊ตฌ์„ฑ, ๊ธด์žฅ๊ฐ ๋ช…๋ช…, ๋ฏธ๋ฌ˜ํ•œ ์–ด์กฐ ๋ณ€ํ™” ๋“ฑ์—์„œ ์—ฌ์ „ํžˆ ์šฐ์œ„๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์˜ ๋ฐœ์ „์œผ๋กœ ์ธํ•ด ๊ฐ์„ฑ์  ์ง€์ง€ ๋Šฅ๋ ฅ์˜ ์ผ๋ถ€ ์˜์—ญ์—์„œ ์ธ๊ฐ„ ์ˆ˜์ค€์— ๋„๋‹ฌํ•˜๊ฑฐ๋‚˜ ๋Šฅ๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ด์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ธ๊ฐ„๊ณผ LLM ํ‰๊ฐ€์ž ๋ชจ๋‘ ์œ ์‚ฌํ•œ ๊ธฐ์ค€์„ ์‚ฌ์šฉํ•˜์—ฌ ์ง€์› ๋Œ€ํ™”์˜ ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ๋‚˜ํƒ€๋‚˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์•„์ง ์ธ๊ฐ„์€ ํŠนํžˆ ์–ด๋ ค์šด ๋Œ€ํ™” ์ƒํ™ฉ์—์„œ ๋ณด๋‹ค ๋ฏธ๋ฌ˜ํ•˜๊ณ  ์ ์‘์ ์ธ ๊ฐ์„ฑ์  ์ง€์ง€ ๋Šฅ๋ ฅ์„ ๋ฐœํœ˜ํ•˜๋Š” ๋ฐ ๊ฐ•์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ชจ๋ธ ํฌ๊ธฐ์— ๋”ฐ๋ฅธ ๊ฐ์„ฑ์  ๋Œ€ํ™” ๋Šฅ๋ ฅ์˜ ํ™•์žฅ์„ฑ์„ ์ดํ•ดํ•˜๊ณ , ์ธ๊ฐ„์˜ ์‚ฌํšŒ์  ํŒ๋‹จ๊ณผ ๋ชจ๋ธ ์ƒ์„ฑ ์ง€์› ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ํ†ตํ•ฉ์ ์ธ ๊ฒฝํ—˜์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘