Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Dynamic Context Compression for Efficient RAG

Created by
  • Haebom

Author

Shuyu Guo, Zhaochun Ren

Outline

This paper proposes an adaptive context compression (ACC-RAG) framework to address the high inference cost problem in retrieval-augmented generation (RAG). Unlike existing fixed-rate compression methods, ACC-RAG dynamically adjusts the compression ratio according to the complexity of the input question, thereby improving both efficiency and accuracy. Combining a hierarchical compressor and a context selector, it retains only the minimum necessary information, mimicking a human-scanning process. Experimental results using Wikipedia and five question-answering (QA) datasets demonstrate that ACC-RAG outperforms existing fixed-rate compression methods, achieves an inference speed that is more than four times faster than standard RAG, and maintains or improves accuracy.

Takeaways, Limitations

Takeaways:
We present a novel approach to effectively address the inference cost problem of RAG.
Dynamically adjusts the compression ratio based on input complexity to achieve a balanced improvement in efficiency and accuracy.
It provides much faster inference speed than the existing RAG method.
We have demonstrated performance improvements on various QA datasets.
Limitations:
The performance improvements of the proposed ACC-RAG may be biased towards specific datasets and question types.
The design of hierarchical compressors and context selectors is complex and can be difficult to implement and optimize.
Further research is needed on performance and scalability in real-world large-scale application environments.
👍