Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Benchmarking XAI Explanations with Human-Aligned Evaluations

Created by
  • Haebom

Author

R emi Kazmierczak, Steve Azzolin, Elo ise Berthier, Anna Hedstr om, Patricia Delhomme, David Filliat, Nicolas Bousquet, Goran Frehse, Massimiliano Mancini, Baptiste Caramiaux, Andrea Passerini, Gianni Franchi

Outline

PASTA (Perceptual Assessment System for Explanation of Artificial Intelligence) is a novel human-centered framework for evaluating explainable AI (XAI) techniques in computer vision. We present the PASTA-dataset, a large-scale benchmark that encompasses various models and explanation methods (saliency-based and concept-based). This dataset enables robust and comparable analysis of XAI techniques based on human judgment. Furthermore, we present an automated, data-driven benchmark (PASTA-score) that uses the PASTA-dataset to predict human preferences, providing scalable, reliable, and consistent assessments that align with human perception. We propose applying this method to investigate the interpretability of existing models and to build more human-interpretable XAI methods. In particular, we differentiate ourselves from existing research by enabling comparison of explanations across different modalities.

Takeaways, Limitations

Takeaways:
Providing a benchmark dataset (PASTA-dataset) for comparative analysis of large-scale, diverse XAI techniques.
Presentation of an automated evaluation index (PASTA-score) that matches human perception
Comparison of XAI descriptions across different modalities is possible.
Contribute to improving the interpretability of XAI methods and developing new XAI methods
Limitations:
Further research is needed to determine the generalizability of the PASTA dataset.
Further validation of the accuracy and reliability of the PASTA-score is needed.
Limitations of evaluation methods limited to specific computer vision domains
Further research is needed on the applicability to other AI fields (natural language processing, reinforcement learning, etc.).
👍