Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

$\Mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization

Created by
  • Haebom

Author

Yuandong Tian

Outline

This paper proposes a mathematical framework to characterize how, under what conditions, and for what features grokking, a delayed generalization phenomenon, occurs in complex inputs. We propose a novel framework, $\mathbf{Li_2}$, that captures the grokking behavior of two-layer nonlinear networks. This framework comprises three stages: (I) lazy learning, (II) independent feature learning, and (III) interactive feature learning. This paper elucidates the role of key hyperparameters, such as weight decay, learning rate, and sample size, on grokking; verifiable scaling laws for memory and generalization; and the underlying principles that drive the effectiveness of Muon-like optimizers.

Takeaways, Limitations

Takeaways:
A proposed $\mathbf{Li_2}$ framework that captures the three stages of the groking phenomenon.
Analysis of the impact of weight decay, learning rate, and sample size on groking.
Deriving verifiable scaling laws for memory and generalization.
Explaining the effectiveness of optimizers such as Muon from the perspective of gradient dynamics.
Suggesting the possibility of expansion to a multi-layer architecture.
Limitations:
The specific Limitations is not explicitly mentioned in the abstract.
Although the possibility of extension to a multi-layer architecture has been proposed, no actual extension studies have been included.
Experimental environment limited to group arithmetic tasks.
👍