This paper proposes the Fisher-Orthogonal Projection (FOP) technique to address optimization problems in large-scale mini-batch learning. The large, high-bandwidth memory of modern GPUs has made mini-batch learning possible, encompassing tens of thousands of samples. However, existing optimization techniques struggle to effectively scale to such large batch sizes. Specifically, linear optimization methods are prone to suboptimal local minima due to gradient noise, which diminishes with increasing batch size. Furthermore, KFAC, a quadratic optimization method, requires excessive damping to maintain stability, resulting in poor performance. FOP utilizes the gradients of two sub-batches to generate variance-aware update directions. This approach restores the effectiveness of quadratic optimization methods even on large batch sizes, enabling improved generalization and faster convergence.