This paper presents CHORD, a novel framework that integrates supervised fine-tuning (SFT) and reinforcement learning (RL), two major post-training methods for improving the performance and aligning actions of large-scale language models (LLMs). Existing approaches that integrate SFT and RL break existing model patterns and risk overfitting to expert data. CHORD addresses this issue by restructuring SFT as a dynamically weighted sub-objective within the on-policy RL process, rather than as a separate step. It integrates a dual control mechanism by analyzing the impact of off-policy expert data at both global and granular levels. It uses global coefficients to guide the transition from off-policy imitation to on-policy exploration, and applies token-specific weighting functions to enable fine-grained learning from expert tokens, preserving on-policy exploration while mitigating the disruption caused by off-policy data. Extensive experiments demonstrate that CHORD achieves a stable and efficient learning process, demonstrating significant performance improvements over baseline models.