This paper systematically reviews 23 empirical studies published from 2017 to 2025 according to the PRISMA guidelines to analyze the limitations of existing research on the user experience (UX) evaluation of conversational recommender systems (CRSs) and suggest future research directions. Specifically, we point out the paucity of research on the UX evaluation of adaptive CRSs and large-scale language models (LLMs). We analyze UX concept definitions, measurement methods, domains, adaptability, and the influence of LLMs. We uncover shortcomings such as the dominance of follow-up research, the rare evaluation of turn-based emotional UX components, and the rare connection between adaptive behaviors and UX outcomes. We also highlight the epistemological opacity and verbosity of LLM-based CRSs. We propose a structured UX metric synthesis for the development of more transparent, engaging, and user-centered CRS evaluation practices, a comparative analysis of adaptive and non-adaptive systems, and a future-oriented agenda for UX evaluation that considers LLMs.