We approach Direct Preference Optimization (DPO) from a Bayesian perspective, interpreting it as a process in which DPO learns the differential information necessary to update the reference policy to the target policy. To achieve this, we introduce Differential Information Distribution (DID) and demonstrate that DPO's log-ratio compensation is justified through DID. We also analyze the impact of DID characteristics on DPO training dynamics and downstream performance.