Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Language Models Fail to Introspect About Their Knowledge of Language

Created by
  • Haebom

Author

Siyuan Song, Jennifer Hu, Kyle Mahowald

Outline

This paper systematically examines the introspective abilities (introspection) of 21 open-source large-scale language models (LLMs) in both grammatical knowledge and word prediction. Given that a model's internal linguistic knowledge can theoretically be supported by direct measures of string probabilities, we assessed how accurately the model's responses to metalinguistic prompts reflect its internal knowledge. We propose a novel introspective metric that measures the extent to which a model's prompt responses predict its own string probabilities and assess whether they exceed the predictions of other models with similar internal knowledge. While both metalinguistic prompts and probability comparisons achieved high task accuracy, we found no evidence that LLMs possessed privileged "self-access." By comprehensively evaluating a variety of open-source models and controlling for model similarity, we provide new evidence supporting the assertion that LLMs are incapable of introspection and that prompt responses should not be confused with the model's linguistic generalizations.

Takeaways, Limitations

Takeaways: A systematic study of the self-reflection abilities of LLMs revealed that LLMs lack self-reflection. It emphasizes that prompt responses should not be simply equated with the model's internal verbal knowledge. The proposed new self-reflection measure could be a useful tool for assessing LLMs' self-reflection abilities.
Limitations: This study was limited to open-source LLMs, and further research is needed on self-reflection skills in closed-source LLMs. Because it focused on only two areas: grammatical knowledge and word prediction, further research is needed on self-reflection skills in other areas.
👍