This paper proposes a novel method for efficiently estimating the input question difficulty of a large-scale language model (LLM). Existing methods rely on iterative response sampling, auxiliary models, or fine-tuning of the target model itself, resulting in significant computational costs and poor generalizability. In this study, we propose a method for estimating difficulty solely using the hidden representation generated by the target LLM. We model the token-level generation process as a Markov chain and define a value function that estimates the expected output quality from any hidden state. This enables efficient and accurate difficulty estimation based solely on the initial hidden state, without generating output tokens. Extensive experiments on various text and multimodal tasks demonstrate that the proposed method outperforms existing baseline models in difficulty estimation. By incorporating adaptive inference strategies such as self-consistency, best-of-N, and self-refine, we achieve high inference efficiency with fewer tokens.