This paper focuses on comprehensively understanding the practical effectiveness of the Spoken Dialogue Model (SDM) and identifies shortcomings compared to well-established, text-based large-scale language models (LLMs). Considering the complexity of spoken dialogue, we highlight the challenges posed by linguistic and phonetic characteristics such as polysemy, homonyms, and contextual dependence. To address these challenges, we present a benchmark dataset containing 1,079 instances in English and Chinese, and evaluate the performance of the SDM using an LLM-based evaluation method.