[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking

Created by
  • Haebom

Author

Mubashara Akhtar, Michael Schlichtkrull, Andreas Vlachos

Outline

This paper points out the limitations of the evidence evaluation methods used in existing automated fact checking (AFC) and proposes a new evaluation metric, Ev²R. Existing methods evaluate the validity of evidence only by relying on the accuracy of the prediction result or the exact match with a predefined knowledge base (e.g., Wikipedia), which has limitations due to the original purpose of the evaluation metric and the limitations of the knowledge base. Ev²R solves the shortcomings of existing methods by combining the reference-based evaluation and the prediction result-based score to simultaneously evaluate how well the evidence matches the reference materials and how reliably it supports the prediction result. Experimental results show that Ev²R outperforms existing methods in terms of accuracy and robustness, and in particular, has a high correlation with human judgment and excellent resistance to adversarial attacks.

Takeaways, Limitations

Takeaways:
We present a new indicator, Ev²R, that overcomes the limitations of the evidence evaluation method of existing automated fact checking (AFC).
Ev²R combines reference-based assessments with predicted outcome-based scoring to enable more accurate and robust evidence assessment.
It shows high correlation with human judgment and proves robustness against adversarial attacks.
It can contribute to the evaluation of evidence and improvement of model performance in the AFC field.
Limitations:
The performance of Ev²R presented in this paper may be limited to specific datasets and settings. Additional experiments on various datasets and environments are needed.
There is a lack of analysis on the computational complexity and efficiency of Ev²R. It may be computationally expensive in practical applications.
Further research is needed on the generalizability of the findings to different types of fact-checking problems and types of evidence.
👍