Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Quantifying Calibration Error in Neural Networks Through Evidence-Based Theory

Created by
  • Haebom

Author

Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos, Frank Kargl

Outline

This paper proposes a novel framework for improving the expected calibration error (ECE) based on subjective logic to assess the reliability of neural networks. Existing metrics such as accuracy and precision have limitations in adequately reflecting trust, confidence, and uncertainty, and in particular, fail to address the problem of overconfidence. The proposed method clusters predicted probabilities and comprehensively measures trust, distrust, and uncertainty using appropriate fusion operators. Experimental results using the MNIST and CIFAR-10 datasets demonstrate improved reliability after calibration. This framework provides interpretability and precise evaluation of AI models in sensitive areas such as healthcare and autonomous systems.

Takeaways, Limitations

Takeaways:
A new framework that comprehensively considers trust, distrust, and uncertainty by introducing subjective logic into reliability assessment.
Contributes to solving the problem of overconfidence, a limitation of existing indicators
It suggests the potential to improve the reliability and interpretability of AI models in sensitive fields such as healthcare and autonomous driving.
The effectiveness of the proposed method is verified through experiments on the MNIST and CIFAR-10 datasets.
Limitations:
Further research is needed on the generalization performance of the proposed framework.
The need to expand experimental results across diverse datasets and models.
Further research is needed on parameter settings of subjective logic and selection of fusion operators.
Further research and validation are needed for practical applications.
👍