Direct Preference Optimization (DPO) is a simple and effective approach for aligning large-scale language models (LLMs) with human preferences without a learned reward model. This study systematically studies the preference data characteristics most important for DPO performance. We demonstrate that the quality of selected responses plays a crucial role in optimizing the DPO objective function, while the quality of rejected responses may have a relatively limited impact. Online DPO configuration for selected responses behaves similarly to supervised learning, and experiments across various tasks demonstrate that improving the quality of selected responses consistently improves performance.