Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs

Created by
  • Haebom

Author

Zhiqiang Liu, Enpei Niu, Yin Hua, Mengshu Sun, Lei Liang, Huajun Chen, Wen Zhang

Outline

This paper proposes SKA-Bench, a novel benchmark for evaluating the structured knowledge (SK) understanding of large-scale language models (LLMs). SKA-Bench includes four types of SKs—knowledge graphs (KGs), tables, KG+text, and tables+text—and generates instances consisting of questions, correct answers, positive knowledge units, and incorrect knowledge units through a three-stage pipeline. To further evaluate the SK understanding of LLMs, we extend the four fundamental testbeds for robustness to noise, order indifference, information integration, and negative information rejection. Experiments on eight representative LLMs demonstrate that existing LLMs still struggle with structured knowledge understanding, and their performance is affected by factors such as the amount of noise, the order of knowledge units, and hallucinations. The dataset and code are available on GitHub.

Takeaways, Limitations

Takeaways:
Provides a comprehensive and rigorous assessment benchmark for LLM's ability to understand structured knowledge.
By comprehensively covering various types of structured knowledge, you can accurately diagnose your LLM weaknesses.
It enables a detailed analysis of the LLM's ability to understand structured knowledge.
It clearly presents the limitations of the structured knowledge comprehension ability of existing LLMs.
Limitations:
The types of LLMs currently included in the benchmark may be limited.
Further research may be needed on the performance evaluation metrics and measurement methods of SKA-Bench.
There may be a bias towards certain types of structured knowledge.
👍