This paper introduces HEPPO-GAE, an FPGA-based accelerator designed to optimize the Generalized Advantage Estimation (GAE) stage of the Proximal Policy Optimization (PPO) algorithm. Unlike existing approaches that focus on trajectory collection and actor-critic updates, HEPPO-GAE addresses the computational requirements of GAE through a parallel pipeline architecture implemented on a single system-on-chip (SoC). It is designed to support hardware accelerators tailored to different PPO stages, and enhances learning stability and performance through strategic normalization techniques that combine dynamic reward normalization and block normalization of values, and 8-bit uniform quantization. It also manages memory bottlenecks to reduce memory usage by 4x and increase cumulative reward by 1.5x. The proposed solution on a single SoC device with programmable logic and embedded processor provides much higher throughput than existing CPU-GPU systems, and significantly improves PPO learning efficiency by minimizing communication latency and throughput bottlenecks. Experimental results show that PPO speed is increased by 30% and memory access time is significantly reduced, demonstrating the wide applicability of HEPPO-GAE to hardware-efficient reinforcement learning algorithms.