Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm

Created by
  • Haebom

Author

Noah Amsel, David Persson, Christopher Musco, Robert M. Gower

Polar Express: GPU-Friendly Polar Decomposition for Deep Learning

Outline

This paper introduces Polar Express, a novel method for computing polar decomposition, which is essential for training deep learning models. This method utilizes only matrix-matrix multiplication for efficient computation on GPUs, and, given the nature of deep learning, prioritizes high throughput over high precision. Based on the research of Chen & Chow and Nakatsukasa & Freund, Polar Express adjusts the update rule by solving a minimax optimization problem at each iteration. This ensures that convergence is as fast as possible in the initial iteration and in the asymptotic phase. Furthermore, it is implemented for use in finite-precision environments such as bfloat16. Integrating Polar Express into the Muon training framework yields consistent validation loss reductions compared to existing methods at various learning rates when training the GPT-2 model using 1 billion tokens from the FineWeb dataset.

Takeaways, Limitations

Takeaways:
It provides efficient polar decomposition operations in GPU environments to improve deep learning model training speed.
Fast convergence is ensured through an update rule based on minimax optimization.
It can also be used in bfloat16 environments, improving computational efficiency.
It demonstrated superior performance compared to existing methods in GPT-2 model training.
Limitations:
There is no Limitations directly stated in the paper.
👍