Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation

Created by
  • Haebom

Author

Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng

Outline

This paper explores leveraging large-scale language models (LLMs) to annotate document usefulness and reduce reliance on expensive manual annotations in training retrieval and augmented retrieval generation (RAG) systems. To bridge the gap between retrieval relevance and generative usefulness, we use LLMs to annotate document usefulness. To effectively utilize multiple positive samples per query, we propose a novel loss function that maximizes their aggregated marginal likelihood. We use the Qwen-2.5-32B model to annotate the MS MARCO dataset for usefulness and conduct retrieval experiments on MS MARCO and BEIR, as well as RAG experiments on MS MARCO QA, NQ, and HotpotQA. Our experimental results show that LLM-generated annotations improve out-of-domain retrieval performance and RAG results compared to models trained solely on manual annotations or subsets of QA metrics. Furthermore, we achieve performance comparable to that achieved with fully manual annotations by combining LLM annotations with 20% of the manual annotations. This study presents a comprehensive approach for leveraging LLM annotations to initialize QA systems on new corpora.

Takeaways, Limitations

Takeaways:
Document usability annotation using LLM reduces reliance on manual annotation and enables the construction of cost-effective QA systems.
LLM annotations contribute to improving out-of-domain search performance and RAG performance.
High performance can be achieved by combining small amounts of manual annotations with LLM annotations.
We present an effective method for initializing a QA system for a new corpus.
Limitations:
Further research is needed to determine the accuracy and reliability of LLM annotations.
Validation of the generalizability of the results to specific LLMs and datasets is needed.
Further experiments with different types of questions and datasets are needed.
👍