Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Reliable Evaluation Protocol for Low-Precision Retrieval

Created by
  • Haebom

Author

Kisu Yang, Yoonna Jang, Hwanseok Jang, Kenneth Choi, Isabelle Augenstein, Heuiseok Lim

Outline

This paper addresses the widespread use of methods to improve the efficiency of retrieval systems using low-precision calculations, which lower the numerical precision of model parameters and calculations. However, this approach often leads to excessive ties in relevance scores between queries and documents at low precision, resulting in increased variability in results and reduced evaluation reliability. To address this, the authors propose a more robust retrieval evaluation protocol designed to reduce score variability. This protocol consists of High Precision Scoring (HPS), which upscales the final score calculation step to high precision to resolve tied candidates with minimal computational cost, and a Tie-Aware Retrieval Metric (TRM), which reports the expected scores, ranges, and biases of tied candidates to quantify order uncertainty. Experiments demonstrate that HPS significantly reduces tie-induced instability, while TRM accurately recovers the expected metric values. This combination enables the construction of a more consistent and reliable evaluation system for low-precision retrieval.

Takeaways, Limitations

Takeaways:
We present novel evaluation protocols (HPS and TRM) that contribute to improving the reliability of low-precision retrieval system evaluation.
We experimentally demonstrate that high-precision scoring (HPS) can dramatically reduce the variability in outcomes due to ties.
The tie-recognition search metric (TRM) allows us to quantitatively analyze the order uncertainty of tie candidates and accurately estimate the expected metric value.
We propose a method to simultaneously improve the efficiency and evaluation reliability of low-precision retrieval systems.
Limitations:
The effectiveness of the proposed method is based on experimental results for a specific search dataset and model, and its generalizability to other datasets or models requires further research.
High-Precision Scoring (HPS) incurs additional computational costs, but quantitative analysis of how cost-effective this is is lacking.
There is a lack of comparative analysis of different types of tie-breaking strategies.
👍