Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems

Created by
  • Haebom

Author

Congmin Min, Rhea Mathew, Joyce Pan, Sahil Bansal, Abbas Keshavarzi, Amar Viswanathan Kannan

Outline

In this paper, we propose a framework for scalable and cost-effective deployment of Graph-based Retrieval Augmented Generation (GraphRAG) in enterprise environments. Existing GraphRAG has been limited in its adoption due to its high computational cost and latency, so we present two key innovations: (1) a dependency-based knowledge graph construction pipeline that extracts entities and relationships from unstructured text by leveraging industry-grade NLP libraries without relying on large-scale language models (LLMs), and (2) a lightweight graph search strategy that combines hybrid query node identification and efficient one-step traversal to extract subgraphs with high recall and low latency. Experimental results using the SAP dataset demonstrate up to 15% (LLM-as-Judge) and 4.35% (RAGAS) performance improvement over existing RAG baseline models, and achieve 94% of the performance of LLM-based knowledge graphs (61.87% vs. 65.83%), while significantly reducing the cost and improving scalability. This demonstrates the feasibility of a practical, explainable, and domain-adaptive Retrieval-Augmented Reasoning system.

Takeaways, Limitations

Takeaways:
Reduce reliance on LLM and present cost-effective GraphRAG deployment and deployment possibilities.
Proposing an efficient knowledge graph construction pipeline leveraging industry-grade NLP libraries.
Achieving high performance and low latency with a lightweight graph search strategy.
Demonstrating the applicability of GraphRAG in a real-world, large-scale enterprise environment.
Presenting the possibility of developing an explainable and domain-adaptive Retrieval-Augmented Reasoning system.
Limitations:
The performance of the proposed framework is based on evaluation results on a specific SAP dataset, and further research is needed on its generalizability to other domains or datasets.
The dependency-based knowledge graph construction method has slightly lower performance than the LLM-based method (61.87% vs. 65.83%). There is a need to find ways to further reduce the performance gap.
There is a possibility of performance degradation for complex questions requiring multi-step inference due to the search strategy based on one-step traversal. Research is needed to improve the performance of multi-step inference.
👍