Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations

Created by
  • Haebom

Author

Noah Y. Siegel, Nicolas Heess, Maria Perez-Ortiz, Oana-Maria Camburu

Outline

We analyze whether the explanations for LLM decisions accurately reflect the actual factors driving those decisions (fidelity). We analyze counterfactual fidelity across 75 models from 13 families, examining the balance between parsimony and comprehensiveness, the method for assessing correlational fidelity metrics, and the potential for manipulation. We propose two new metrics: the Correlational Counterfactual Test (phi-CCT, a simplified version of CCT) and the F-AUROC. Our results show that larger and better-performing models consistently score higher on fidelity metrics.

Takeaways, Limitations

Takeaways:
A larger LLM provides a more accurate explanation of the decision.
Phi-CCT and F-AUROC can be used as new indices for assessing fidelity.
Limitations:
It may be limited to analysis of specific models and metrics.
It may not cover all the various aspects required for assessing fidelity.
It may not provide a fundamental understanding of how the model generates explanations.
👍