This paper aims to provide ubiquitous intelligent services by integrating wireless communications and large-scale language models (LLMs). In collaborative wireless edge device environments, the tradeoff between inference quality and end-to-end latency is a critical issue. Simple queries incur excessive latency, while on-device models underperform complex computations, creating a mismatch between task complexity and resource allocation. To address this, we propose a dynamic, quality-latency-aware routing framework that coordinates inference between lightweight models on mobile devices and powerful models on edge servers. For single-turn queries, the framework integrates BERT predicted semantic scores with communication and computation overhead, while for multi-turn conversations, it utilizes two distinct cost models that further quantify the context-aware costs incurred in model switching and KV cache management. Extensive experimental results demonstrate that our proposed framework reduces average response latency by 5-15% and reduces large-scale model invocations by 10-20% compared to competing baselines on the MMLU, GSM8K, and MT-Bench-101 benchmarks, while maintaining full inference quality.