This paper focuses on privacy issues in reinforcement learning (RL), particularly the risk of privacy inference attacks in cyber-physical systems (CPS). Existing centralized differential privacy (DP) models rely on trusted servers, while distributed local models suffer from severe performance degradation. Therefore, this paper proposes a novel algorithm for episodic RL, Shuffled Differential Privacy Policy Elimination (SDP-PE), utilizing the shuffle model, an intermediate trust model. SDP-PE balances privacy and learning performance by introducing an exponential batch schedule and a "forget" mechanism, achieving a near-optimal regret upper bound and offering a superior privacy-regret trade-off than local models. This demonstrates the applicability of the shuffle model for secure data-driven CPS control.