Daily Arxiv

์ „ ์„ธ๊ณ„์—์„œ ๋ฐœ๊ฐ„๋˜๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ด€๋ จ ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•˜๋Š” ํŽ˜์ด์ง€ ์ž…๋‹ˆ๋‹ค.
๋ณธ ํŽ˜์ด์ง€๋Š” Google Gemini๋ฅผ ํ™œ์šฉํ•ด ์š”์•ฝ ์ •๋ฆฌํ•˜๋ฉฐ, ๋น„์˜๋ฆฌ๋กœ ์šด์˜ ๋ฉ๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ €์ž‘๊ถŒ์€ ์ €์ž ๋ฐ ํ•ด๋‹น ๊ธฐ๊ด€์— ์žˆ์œผ๋ฉฐ, ๊ณต์œ  ์‹œ ์ถœ์ฒ˜๋งŒ ๋ช…๊ธฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

MindEval: Benchmarking Language Models on Multi-turn Mental Health Support

Created by
  • Haebom
Category
Empty

์ €์ž

Jose Pombal, Maya D'Eon, Nuno M. Guerreiro, Pedro Henrique Martins, Antonio Farinhas, Ricardo Rei

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ AI ์ฑ—๋ด‡์„ ํ™œ์šฉํ•œ ์ •์‹  ๊ฑด๊ฐ• ์ง€์› ์‹œ์Šคํ…œ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, ์‹ค์ œ ์น˜๋ฃŒ ๋Œ€ํ™”์˜ ๋ณต์žก์„ฑ์„ ํฌ์ฐฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ MindEval์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. MindEval์€ ๋ฉดํ—ˆ๋ฅผ ๊ฐ€์ง„ ์ž„์ƒ ์‹ฌ๋ฆฌํ•™์ž๋“ค๊ณผ ํ˜‘๋ ฅํ•˜์—ฌ ๊ฐœ๋ฐœ๋˜์—ˆ์œผ๋ฉฐ, ํ™˜์ž ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ LLM ๊ธฐ๋ฐ˜ ์ž๋™ ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ ๊ฒฐ๊ณผ, ์ตœ์‹  LLM๋“ค์ด ์ „๋ฐ˜์ ์œผ๋กœ ์–ด๋ ค์›€์„ ๊ฒช์œผ๋ฉฐ, ํŠนํžˆ ๋ฌธ์ œ์ ์ธ AI ํŠน์œ ์˜ ์†Œํ†ต ๋ฐฉ์‹์—์„œ ์•ฝ์ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์ •์‹  ๊ฑด๊ฐ• ์ง€์› ์ฑ—๋ด‡์˜ ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ํ˜„์‹ค์ ์ธ ๋ฒค์น˜๋งˆํฌ์˜ ๋ถ€์žฌ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, ์‹ค์ œ ์น˜๋ฃŒ ๋Œ€ํ™”์™€ ์œ ์‚ฌํ•œ ํ™˜๊ฒฝ์—์„œ LLM์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ž๋™ํ™”๋œ ํ‰๊ฐ€ ์‹œ์Šคํ…œ์„ ํ†ตํ•ด ๋ชจ๋ธ ๊ฐ„์˜ ๋น„๊ต๋ฅผ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๊ณ , ๋ชจ๋ธ์˜ ์ทจ์•ฝ์ ์„ ํŒŒ์•…ํ•˜์—ฌ ๊ฐœ์„  ๋ฐฉํ–ฅ์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ชจ๋ธ์˜ ๊ทœ๋ชจ๋‚˜ ์ถ”๋ก  ๋Šฅ๋ ฅ๋งŒ์œผ๋กœ๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์žฅํ•˜์ง€ ๋ชปํ•˜๋ฉฐ, ๋Œ€ํ™”์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ง€๊ฑฐ๋‚˜ ํ™˜์ž์˜ ์ฆ์ƒ์ด ์‹ฌํ•ด์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
๐Ÿ‘