Advances in reinforcement learning (RL) have improved the agent capabilities of large-scale language models (LLMs). Existing approaches that rely solely on outcome rewards in long-term, multi-stage agent tasks struggle with sparse supervision. To address this issue, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree exploration. Each tree node represents a complete agent interaction step. By sharing a common prefix, tree exploration sampling increases the number of achievable rollouts within a fixed token or tool call budget. Furthermore, the tree-structured trajectories allow for the natural construction of step-by-step process supervision signals even using outcome rewards alone. Based on this, Tree-GRPO estimates grouped relative advantages at both the intra- and inter-tree levels. Theoretical analysis demonstrates that the goal of intra-tree group relative policy optimization is identical to that of step-by-step direct preference learning. Experiments on 11 datasets and three types of QA tasks demonstrate that the proposed tree-based RL outperforms chain-based RL methods.