Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Created by
  • Haebom

Author

Yuanzhe Hu, Yu Wang, Julian McAuley

Outline

We present MemoryAgentBench, a new benchmark for evaluating memory capabilities, a core competency of large-scale language model (LLM) agents. Existing benchmarks fail to capture the interactive and multi-stage nature of memory agents and address all four core competencies (accurate retrieval, test-time learning, long-range comprehension, and selective forgetting). MemoryAgentBench simulates the incremental information processing characteristics of memory agents by transforming existing long-text context datasets and integrating newly constructed datasets into a multi-stage format. Through evaluations of various memory agents, we demonstrate that current methodologies fail to adequately capture all capabilities, highlighting the need for research into memory mechanisms.

Takeaways, Limitations

Takeaways:
A new benchmark is presented to evaluate the memory capacity of LLM agents.
Define four core capabilities of memory agents: accurate retrieval, test-time learning, long-range understanding, and selective forgetting.
Simulating the interactive properties of memory agents using multi-stage formats and novel dataset configurations.
Through evaluation of various memory agents, we identified the limitations of current technology and suggested the need for further research.
Limitations:
Lack of detailed information about the composition and characteristics of specific benchmark datasets.
Lack of detailed information about the type and settings of memory agents used in the evaluation.
Lack of generalizability and applicability of the proposed benchmark to other tasks.
👍