This paper identifies and analyzes two major problems in traditional group-relative policy optimization (GRPO): (i) tokens frequently appearing in completions with both positive and negative rewards, leading to conflicting gradient updates and decreasing output probabilities, and (ii) negatively rewarded completions penalize confident responses and shift model decisions to unlikely tokens, flattening the output distribution and impairing learning. To address these problems, this paper proposes group-relative trajectory-based policy optimization (GTPO). GTPO identifies conflicting tokens that co-occur in completions with conflicting rewards and protects them by amplifying positive updates while skipping negative ones. Furthermore, to prevent policy collapse, GTPO filters completions whose entropy exceeds a provably high threshold. Unlike GRPO, GTPO does not rely on KL-divergence regularization, so it does not require a reference model during training. Multiple experiments on the GSM8K, MATH, and AIME 2024 benchmarks demonstrate that GTPO provides greater training stability and improved performance.