Current evaluation methods for large-scale language models (LLMs) suffer from overestimation, biased evaluation, and mismatches in question difficulty, which hinder effective application and optimization. To address this, this paper proposes Agent-as-Interviewer, a dynamic evaluation method that utilizes LLM agents to perform multi-step interactions. Agent-as-Interviewer utilizes agents to invoke knowledge tools to leverage broader and deeper knowledge in dynamic multi-step question generation. It also plans query strategies to adjust question difficulty, thereby controlling difficulty according to the target LLM's actual capabilities. Building on this evaluation method, we develop JudgeAgent, a knowledge-based dynamic evaluation framework that utilizes knowledge-based synthesis as the agent's tool and difficulty scores as strategy guidance. JudgeAgent provides useful suggestions for improving the target model, and experiments demonstrate that Agent-as-Interviewer accurately identifies the knowledge and ability boundaries of the target model.