This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper points out the lack of evaluation methodologies for knowledge-graph-based retrieval-augmented generation (KG-RAG) models and presents a novel benchmark construction method and evaluation protocol to systematically evaluate the inference ability of KG-RAG models under knowledge incompleteness. We point out that existing benchmarks include questions that can be answered directly using existing triplets in the knowledge graph, making it difficult to evaluate the actual inference ability of the models. In addition, inconsistent evaluation metrics and lenient answer matching criteria hinder meaningful comparisons between models. Experimental results show that existing KG-RAG methods exhibit limited inference ability in situations where knowledge is missing, tend to rely on internal memory, and show variable generalization levels depending on their designs.
Takeaways, Limitations
•
Takeaways: We present a new benchmark and evaluation protocol for objectively assessing the inference ability of the KG-RAG model under knowledge incompleteness. We provide an empirical analysis of the inference and generalization capabilities of existing KG-RAG models. We also suggest directions for the development and improvement of KG-RAG models.
•
Limitations: Further research is needed to determine the generalizability of the proposed benchmark and evaluation protocol. Applicability to various types of knowledge graphs and KG-RAG models is also needed. Further review is needed to determine the objectivity and reliability of the new evaluation method.