Sign In

MINTEval: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Hyunji Lee, Justin Chih-Yao Chen, Joykirat Singh, Zaid Khan, Elias Stengel-Eskin, Mohit Bansal

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๊ธด ์‹œ๊ฐ„ ๋™์•ˆ ์ •๋ณด๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ณ  ์ƒํ˜ธ ๊ฐ„์„ญ์ด ๋ฐœ์ƒํ•˜๋Š” ํ˜„์‹ค์ ์ธ ํ™˜๊ฒฝ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ๊ฐ•ํ™” ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ MINTEval์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. MINTEval์€ ๋ณต์žกํ•˜๊ฒŒ ์–ฝํžŒ ๋งฅ๋ฝ, ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ, ๊ทธ๋ฆฌ๊ณ  ๋‹จ์ผ/๋‹ค์ค‘ ๋Œ€์ƒ ์งˆ์˜ ์œ ํ˜•์„ ํฌํ•จํ•˜๋ฉฐ, ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ๊ฐ€ ๋‹ค๋ฃจ์ง€ ๋ชปํ–ˆ๋˜ ๋™์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ์ƒํ˜ธ์ž‘์šฉ์„ ์ค‘์ ์ ์œผ๋กœ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ํ˜„์žฌ ์—์ด์ „ํŠธ๋“ค์€ ์ „๋ฐ˜์ ์œผ๋กœ ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ํŠนํžˆ ์—ฌ๋Ÿฌ ์ •๋ณด ์กฐ๊ฐ์„ ์ข…ํ•ฉํ•ด์•ผ ํ•˜๋Š” ์งˆ๋ฌธ์— ์ทจ์•ฝํ•จ์„ ๋‚˜ํƒ€๋ƒˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ธด ์‹œ๊ฐ„ ๋™์•ˆ์˜ ๋™์  ๋ฉ”๋ชจ๋ฆฌ ์ƒํ˜ธ์ž‘์šฉ ํ‰๊ฐ€์˜ ์ค‘์š”์„ฑ: ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด, ํ˜„์‹ค ์„ธ๊ณ„์˜ ๋ณต์žกํ•œ ์ •๋ณด๋ฅผ ๋‹ค๋ฃจ๋Š” ์—์ด์ „ํŠธ์˜ ์‹ค์ œ์ ์ธ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํ‘œ์ค€์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ฉ”๋ชจ๋ฆฌ ๊ฐ•ํ™” ์—์ด์ „ํŠธ์˜ ๊ทผ๋ณธ์ ์ธ ์„ฑ๋Šฅ ํ•œ๊ณ„: ํŠนํžˆ ์ •๋ณด๊ฐ€ ๊ณ„์† ์—…๋ฐ์ดํŠธ๋˜๊ณ  ๊ฐ„์„ญ์ด ๋ฐœ์ƒํ•˜๋Š” ์ƒํ™ฉ์—์„œ, ํ˜„์žฌ์˜ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์€ ์ •๋ณด๋ฅผ ์ •ํ™•ํžˆ ๊ฒ€์ƒ‰ํ•˜๊ณ  ์ข…ํ•ฉ์ ์œผ๋กœ ์ถ”๋ก ํ•˜๋Š” ๋ฐ ์ƒ๋‹นํ•œ ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
๊ฒ€์ƒ‰ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์„ฑ์˜ ๊ฐœ์„  ํ•„์š”์„ฑ: ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ์ฃผ๋กœ ๊ฒ€์ƒ‰ ๋‹จ๊ณ„์™€ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์„ฑ ๋ฐฉ์‹์— ๊ธฐ์ธํ•จ์„ ๋ฐํ˜€, ํ–ฅํ›„ ์—ฐ๊ตฌ๊ฐ€ ์ง‘์ค‘ํ•ด์•ผ ํ•  ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์—…๋ฐ์ดํŠธ๋˜๊ฑฐ๋‚˜ ๊ฐ„์„ญ๋œ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์ •ํ™•ํžˆ ๊ธฐ์–ตํ•˜๊ณ  ์ถ”๋ก ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•œ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค.
๐Ÿ‘