Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

The LLM Already Knows: Estimating LLM - Perceived Question Difficulty via Hidden Representations

Created by
  • Haebom

Author

Yubo Zhu, Dongrui Liu, Zecheng Lin, Wei Tong, Sheng Zhong, Jing Shao

Outline

This paper proposes a novel method for efficiently estimating the input question difficulty of a large-scale language model (LLM). Existing methods rely on iterative response sampling, auxiliary models, or fine-tuning of the target model itself, resulting in significant computational costs and poor generalizability. In this study, we propose a method for estimating difficulty solely using the hidden representation generated by the target LLM. We model the token-level generation process as a Markov chain and define a value function that estimates the expected output quality from any hidden state. This enables efficient and accurate difficulty estimation based solely on the initial hidden state, without generating output tokens. Extensive experiments on various text and multimodal tasks demonstrate that the proposed method outperforms existing baseline models in difficulty estimation. By incorporating adaptive inference strategies such as self-consistency, best-of-N, and self-refine, we achieve high inference efficiency with fewer tokens.

Takeaways, Limitations

Takeaways:
A novel method for efficiently estimating the difficulty of input questions in LLM is presented.
Difficulty can be estimated using only the initial hidden state without generating output tokens.
Improved difficulty estimation performance compared to existing methods.
Improving inference efficiency through combination with adaptive inference strategies.
Limitations:
Further research is needed to determine the generality of the proposed method and its applicability to various LLMs.
Lack of detailed discussion on the design and optimization of value functions.
Potential performance bias in difficulty estimates for certain types of questions or tasks.
👍