This paper highlights that despite recent advances in Guided Reward-based Policy Optimization (GRPO), which improves human preference alignment in image and video generation models, high computational costs due to on-policy rollout and excessive SDE sampling steps, as well as training instability caused by sparse rewards, still persist. To address these issues, we propose BranchGRPO, a novel method that introduces a branching sampling policy to update the SDE sampling process. By sharing computation across common prefixes and pruning low-reward paths and redundant depths, BranchGRPO maintains or improves exploration diversity while significantly reducing per-update computational costs. Key contributions include reduced rollout and training costs through branching sampling techniques, a tree-based benefit estimator that incorporates dense process-level rewards, and improved convergence and performance through pruning strategies that leverage path and depth redundancy. Experimental results on image and video preference alignment show that BranchGRPO improves alignment scores by 16% over a robust baseline model while reducing training time by 50%.