This paper proposes Group-in-Group Policy Optimization (GiGPO), a novel algorithm that addresses the scalability challenges of long-term, large-scale language model (LLM) agent training using group-based reinforcement learning (RL). While maintaining the advantages of existing group-based RL (evaluator-free, low memory footprint, and stable convergence), it achieves fine-grained stage-level credit assignment through a hierarchical structure that computes relative advantages at both the episode and stage levels. At the episode level, the macroscopic relative advantage is calculated based on groups of completed trajectories, while at the stage level, the microscopic relative advantage is estimated by introducing an anchor state grouping mechanism that identifies recurring environmental states and inversely constructs stage-level groups. Evaluations on the ALFWorld and WebShop benchmarks using Qwen2.5-1.5B-Instruct and Qwen2.5-7B-Instruct demonstrate performance gains of over 12% on ALFWorld and over 9% on WebShop compared to existing GRPO baselines. This approach maintains the same GPU memory overhead and LLM rollout, with little or no additional time overhead.