The conversational capabilities of large-scale language models (LLMs) offer significant opportunities for scalable and interactive tutoring. Existing research has focused on Socratic question generation, but has overlooked the crucial aspect of adaptive guidance based on the learner's cognitive state. This study shifts focus beyond question generation to instructional guidance. We question whether LLMs can mimic expert tutors, who dynamically adjust their strategies based on the learner's state. To this end, we propose GuideEval, a benchmark based on real-world educational conversations, and evaluate instructional guidance through a three-step behavioral framework: (1) recognition (inferring learner state), (2) accommodation (adapting teaching strategies), and (3) prompting (stimulating appropriate reflection).