In this paper, we present a novel cache-aware routing strategy for efficient deployment of mixed-expert (MoE) large-scale language models (LLMs) in memory-constrained environments. Existing MoE LLMs improve performance by selectively leveraging specific experts for each input, but they struggle to be deployed on memory-constrained devices, especially for sequential token generation with batch size 1. In this study, we propose a novel cache-aware routing strategy that improves cache locality by leveraging expert reuse during token generation to optimize MoE on memory-constrained devices where only a subset of expert weights can be loaded into DRAM. We present on-device results demonstrating a 2x speedup on mobile devices with language modeling, MMLU, and GSM8K benchmarks, extending the applicability of MoE to real-world applications as a flexible solution that requires no training.