Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Overflow Prevention Enhances Long-Context Recurrent LLMs

Created by
  • Haebom

Author

Assaf Ben-Kish, Itamar Zimerman, M. Jehanzeb Mirza, Lior Wolf, James Glass, Leonid Karlinsky, Raja Giryes

Outline

This paper studies recent advances in recurrent sub-quadratic models for improving long-context processing efficiency. We investigate leading long-context models, focusing on the impact of fixed-size recurrent memory on performance. Experimental results show that these models underutilize long-context models even when trained with long contexts. We demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portions of the input, mitigates recurrent memory failures and is effective for many long-context tasks. On LongBench, the proposed method improves the performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Remarkably, this simple approach achieves state-of-the-art results on the demanding LongBench v2 benchmark, competing with Transformers of the same size. Furthermore, the fact that a single-chunk strategy provides better performance raises the question of whether recurrent models truly utilize long-range dependencies.

Takeaways, Limitations

Takeaways: Shows that chunk-based inference can significantly improve the performance of long-context models. A novel method is presented to improve the efficiency of long-context processing in recurrent models. Achieves state-of-the-art performance on LongBench v2. The paper also raises the need to rethink the use of long-range dependencies in recurrent models.
Limitations: Further research is needed to determine the generalizability of the proposed method. Further experiments are needed on a variety of long-term context tasks and models. Further analysis of the memory efficiency of recurrent models is needed.
👍