This paper highlights that the advancement of large-scale language models and visual-language models has made web agents essential for automating web interactions. However, training web agents using reinforcement learning suffers from issues such as credit allocation errors, excessive annotation costs, and reward sparsity. To address these issues, we propose Tree-Guided Preference Optimization (TGPO), an offline reinforcement learning framework that eliminates label conflicts by merging semantically equivalent states using a tree-structured trajectory representation. TGPO integrates a process reward model that automatically generates fine-grained rewards through subgoal progression, overlap detection, and action verification, along with a dynamic weighting mechanism that prioritizes high-impact decision points during learning. Experiments on the Online-Mind2Web and self-built C-WebShop datasets demonstrate that TGPO significantly outperforms existing methods, achieving higher success rates with fewer redundant steps.