[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Created by
  • Haebom

Author

Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, Paul Pu Liang

Outline

This paper addresses the limitations of modern language agents that retrieve external information, adapt to observations, and answer interdependent questions in long-term, multi-round interactions. Existing LLM systems rely on full context prompting, which appends all past rounds regardless of relevance, resulting in infinite memory growth, increased computational cost, and poor inference performance for input lengths outside the distribution. In response, this paper proposes MEM1, an end-to-end reinforcement learning framework that can perform long-term, multi-round tasks using constant memory. MEM1 updates a compressed shared internal state that supports memory integration and inference at each round, integrating new observations from the environment with previous memories while strategically removing irrelevant or redundant information. In addition, we propose a simple, yet effective, and scalable method to support learning in more realistic and constructive environments by composing existing datasets into arbitrarily complex task sequences. Experiments across three domains, including internal search QA, open-domain web QA, and multi-hop web shopping, demonstrate that MEM1-7B improves performance by 3.5x over Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task while reducing memory usage by 3.7x, and generalizes well beyond the training period. Our results demonstrate the potential of inference-based memory integration as a scalable alternative to existing solutions for training long-term interacting agents that optimizes both efficiency and performance.

Takeaways, Limitations

Takeaways:
We present the possibility of simultaneously improving the efficiency and performance of long-term, multi-turn interactive agents through inference-based memory integration.
MEM1 effectively solves the memory limitation problem of existing LLM and achieves excellent performance even in limited memory environments.
We verify the generalization ability of MEM1 through experiments in various domains.
We present a method for building a scalable multi-turn environment using existing datasets.
Limitations:
Lack of detailed explanation of MEM1's internal state update strategy and information deletion criteria.
Because of the bias in evaluating performance on specific datasets and tasks, further research is needed on generalization performance in diverse environments.
There is a need to expand experimental environments to include more complex and diverse interactions.
Further research is needed to optimize memory management strategies.
👍