This paper proposes ProactiveEval, a unified framework for evaluating the lexical conversational ability of large-scale language models (LLMs). To address the limitations of previous studies, which have focused on specific domains or task-oriented scenarios and thus limited comprehensive exploration of the models' lexical conversational ability, we decompose lexical conversation into two aspects: goal planning and conversation guidance. We establish evaluation metrics across multiple domains. Furthermore, we design this framework to automatically generate diverse and challenging evaluation data. We develop 328 evaluation environments across six different domains and experiment with 22 LLMs, demonstrating that DeepSeek-R1 and Claude-3.7-Sonnet perform well on the goal planning and conversation guidance tasks, respectively. Finally, we investigate the impact of reasoning ability on lexical behavior and discuss implications for future model development.