Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Enhancing RAG Efficiency with Adaptive Context Compression

Created by
  • Haebom

Author

Shuyu Guo, Shuo Zhang, Zhaochun Ren

Outline

This paper proposes Adaptive Context Compression (ACC) to address the high inference cost problem in retrieval-augmented generation (RAG). Unlike existing fixed-compression-ratio methods, ACC-RAG dynamically adjusts the compression ratio according to the complexity of the input query, thereby improving both efficiency and accuracy. It utilizes a hierarchical compressor and a context selector to retain only the minimum necessary information, similar to a human skimming through a text. Experimental results using Wikipedia and five question-answering (QA) datasets show that ACC-RAG outperforms existing fixed-compression-ratio methods and achieves an inference speed that is more than four times faster than standard RAG, while maintaining or improving accuracy.

Takeaways, Limitations

Takeaways:
A novel approach to effectively address the inference cost problem of RAG is presented.
Proof of the effectiveness of an adaptive compression technique that dynamically adjusts the compression ratio according to input complexity.
Development of technology that dramatically improves inference speed without compromising accuracy.
Presenting an efficient information processing method through hierarchical compression and context selection.
Limitations:
There is a possibility that the performance of the proposed method may be limited to certain datasets.
Further research is needed on generalizability across different types of LLM and RAG systems.
Potential increase in computational cost due to the complexity of the hierarchical compression and context selection process.
Performance evaluation in a real-world, large-scale application environment is required.
👍