This paper discusses recent advances in Guided Reward Policy Optimization (GRPO), which improves human preference alignment in image and video generation models. Existing GRPO suffers from high computational costs due to on-policy rollout and excessive Stochastic Differential Equation (SDE) sampling steps, as well as training instability caused by sparse rewards. To address these issues, we propose BranchGRPO, a novel method that introduces a branching sampling policy to update the SDE sampling process. By sharing computation across common prefixes and pruning low-reward paths and redundant depths, BranchGRPO significantly reduces per-update computational costs while maintaining or improving exploration diversity. Key contributions include reduced rollout and training costs through branching sampling techniques, a tree-based benefit estimator that incorporates dense process-level rewards, and improved convergence and performance through pruning strategies that leverage path and depth redundancy. Experimental results demonstrate that BranchGRPO improves alignment scores by 16% and reduces training time by 50% compared to a robust baseline model.