This paper introduces Polar Express, a novel method for computing polar decomposition, which is essential for training deep learning models. This method utilizes only matrix-matrix multiplication for efficient computation on GPUs, and, given the nature of deep learning, prioritizes high throughput over high precision. Based on the research of Chen & Chow and Nakatsukasa & Freund, Polar Express adjusts the update rule by solving a minimax optimization problem at each iteration. This ensures that convergence is as fast as possible in the initial iteration and in the asymptotic phase. Furthermore, it is implemented for use in finite-precision environments such as bfloat16. Integrating Polar Express into the Muon training framework yields consistent validation loss reductions compared to existing methods at various learning rates when training the GPT-2 model using 1 billion tokens from the FineWeb dataset.