We propose a novel framework, $\mathbf{Li_2}$, for the grokking phenomenon. This framework captures the grokking behavior of two-layer nonlinear networks in three stages: (I) lazy learning, (II) independent feature learning, and (III) interactive feature learning. $\mathbf{Li_2}$ illuminates the impact of key hyperparameters, such as weight decay, learning rate, and sample size, on grokking. It also presents provable scaling laws for feature emergence, memorization, and generalization, and demonstrates the effectiveness of state-of-the-art optimizers such as Muon.