This paper analyzes two key problems of existing Group-relative Policy Optimization (GRPO): (i) conflicting gradient updates that occur when tokens receive both positive and negative rewards, and (ii) the problem that negatively rewarded final versions penalize confident responses and shift model decisions toward less probable tokens, flattening the output distribution and impeding learning. To address these issues, this paper proposes Group-relative Trajectory-based Policy Optimization (GTPO), which identifies conflicting tokens and amplifies positive updates while skipping negative ones. Furthermore, it prevents policy collapse by filtering final versions with entropy exceeding a certain threshold. Unlike GRPO, GTPO does not rely on KL-divergence regularization, eliminating the need for a reference model during training. We demonstrate improved performance and stability through multiple experiments on the GSM8K, MATH, and AIME 2024 benchmarks.