This paper highlights that the performance of Direct Preference Optimization (DPO), which has emerged as an effective method for aligning large-scale language models (LLMs) with human preferences, is critically dependent on the quality of the underlying human preference data. While previous research has explored various data selection strategies, these approaches have overlooked the impact of the evolving state of the language model during the optimization process. Therefore, this paper presents a novel problem: sample scheduling for DPO. We aim to dynamically and adaptively schedule training samples based on the evolving batch-by-batch state of the model throughout preference optimization. To address this, we propose SamS, an efficient and effective algorithm that adaptively selects samples from each training batch based on learning feedback from the LLM, maximizing potential generalization performance. By incorporating SamS into the DPO algorithm, we achieve significant performance improvements across tasks without modifying the core DPO algorithm, while minimizing additional computational overhead.