This paper proposes Selective Preference Optimization (SePO), a novel selective alignment strategy for large-scale language model alignment. Unlike existing token-level alignment methods that optimize all tokens or employ complex and costly key token selection strategies, SePO focuses on efficient key token selection. SePO presents the first token selection method based on Direct Preference Optimization (DPO), which trains an oracle model to estimate a token-level reward function for the target data. This method is applicable to existing alignment datasets with response-level annotations and enables cost-effective token selection using a small oracle model and training data. The estimated reward function is used to score all tokens in the target dataset, and only key tokens are selected to supervise the target policy model using a contrastive objective function without a reference model. Extensive experiments on three publicly available evaluation benchmarks demonstrate that SePO significantly outperforms competing baseline methods by optimizing only 30% of the key tokens in the target dataset. Applying SePO from weak generalization to strong generalization demonstrates that a weak oracle model effectively supervises a strong policy model with up to 16.8 times more parameters. Furthermore, SePO effectively selects key tokens from out-of-distribution data, improving the strong policy model and mitigating the overfitting problem.