[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Know Or Not: a library for evaluating out-of-knowledge base robustness

Created by
  • Haebom

Author

Jessica Foo, Pradyumna Shyama Prasad, Shaun Khoo

Outline

In this paper, we present a novel evaluation methodology to address the hallucination problem of large-scale language models (LLMs), especially when answering questions outside the knowledge base in the augmented generation (RAG) setting. We introduce knowornot, an open-source library that enables automated evaluation instead of traditional manual annotation, and show that it can be used to systematically evaluate the robustness of LLMs outside the knowledge base (OOKB). knowornot supports the development of custom evaluation data and pipelines, and provides features such as a unified API, a modular architecture, rigorous data modeling, and a variety of user-defined tools. We demonstrate the utility of knowornot by developing a benchmark called PolicyBench, which includes four government policy-related question-answering chatbots. The source code of knowornot is available on GitHub.

Takeaways, Limitations

Takeaways:
We provide a novel methodology and open-source tool (knowornot) for evaluating OOKB robustness on LLM's hallucination problem, especially in the RAG setting.
Automated OOKB robustness assessment without manual annotation.
Providing a flexible and scalable platform with customization capabilities.
Proving the usefulness of knowornot through real-world benchmarks (PolicyBench).
Limitations:
More extensive experimental and comparative studies on the performance and efficiency of knowornot are needed.
PolicyBench is limited to government policies, so generalizability to other domains needs to be verified.
There may be a subjective aspect to the definition and measurement of hallucinations.
Further research is needed to increase the objectivity and reliability of the evaluation.
👍