Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Can LLMs Explain Themselves Counterfactually?

Created by
  • Haebom

Author

Zahra Dehghanighobadi, Asja Fischer, Muhammad Bilal Zafar

Outline

This paper studies the self-explanatory ability of large-scale language models (LLMs), specifically the effectiveness of self-generating counterexample explanations (SCEs). Unlike existing post-hoc explanation methods, we focus on the self-explanatory nature of LLMs, where they explain their outputs. We design and analyze tests to evaluate their SCE generation capabilities using various LLMs, model sizes, temperature settings, and datasets. Our analysis reveals that LLMs sometimes struggle to generate SCEs, and even when they do generate SCEs, their predictions and their own counterexample inferences sometimes do not match.

Takeaways, Limitations

Takeaways: Presenting a systematic assessment and analysis method for the self-explanatory capacity of LLMs, particularly their ability to generate SCEs. This will enhance understanding of the reliability and limitations of LLM-based explanations.
Limitations: The current study focuses only on a specific type of self-explanation, SCE. Research on various types of self-explanations is needed. There is a lack of specific suggestions for improving the LLM's SCE generation ability. There is a lack of in-depth analysis of the causes of the discrepancy between LLM's counterexample inference and prediction.
👍