Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference

Created by
  • Haebom

Author

Guangyuan Ma, Yongliang Ma, Xuanrui Gou, Zhenpeng Su, Ming Zhou, Songlin Hu

Outline

This paper proposes LightRetriever to address the efficiency issues in large-scale language model (LLM)-based text retrieval. Existing LLM-based retrieval requires significant computational effort for query encoding, leading to slowdowns and resource consumption. LightRetriever uses existing large-scale LLMs for document encoding, but dramatically improves speed by streamlining the query encoding process to the level of an embedding lookup. Experimental results using an A800 GPU demonstrate that query encoding speed is over 1,000x faster, overall search throughput is over 10x faster, and retrieval performance is maintained at an average of 95% across various tasks.

Takeaways, Limitations

Takeaways:
Introducing LightRetriever, which dramatically improves the speed and efficiency of LLM-based searches.
Minimizing the computational burden of query encoding increases applicability to real-time search systems.
Achieve speed improvements while maintaining high search performance on large datasets.
Reduce resource consumption through lightweight query encoding.
Limitations:
Because it relies on embedding lookups, the quality of the embeddings can have a significant impact on search performance.
There is no guarantee that the proposed method will provide the same performance for all types of search queries.
Because document encoding still uses large LLMs, the computational complexity of document encoding itself can still be significant.
LightRetriever's performance improvements were measured on a specific hardware (A800 GPU) environment, so performance may vary in other environments.
👍