Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations

Created by
  • Haebom

Author

Enric Boix-Adsera, Neil Mallinar, James B. Simon, Mikhail Belkin

Outline

This paper addresses a central challenge in deep learning: understanding how neural networks learn representations. A key approach is the Neural Feature Hypothesis (NFA) (Radhakrishnan et al. 2024), a conjecture about the mechanism by which feature learning occurs. While empirically validated, NFAs lack a theoretical basis, making it unclear when they might fail and how to improve them. This paper uses a first-principles approach to understand why this observation holds true and when it does not. Using first-order optimization criteria, we derive Feature-At-Convergence (FACT), an alternative to NFA. FACT (a) achieves greater agreement with learned features at convergence, (b) explains why NFAs hold in most settings, and (c) captures essential feature learning phenomena in neural networks, such as the groking behavior of modular arithmetic and phase transitions in sparse parity learning, similar to NFAs. Therefore, the results of this study integrate the theoretical first-order optimization analysis of neural networks with the empirically driven NFA literature, and provide a principled alternative that is verifiable and empirically valid at convergence.

Takeaways, Limitations

Takeaways:
Deepen the theoretical understanding of feature learning in neural networks.
As an alternative to NFA, we present the convergence feature theorem (FACT) and explain the reasons for the establishment of NFA and its limitations.
FACT describes important feature learning phenomena in neural networks, such as the groking behavior of modular arithmetic and the phase transition in sparse parity learning.
Integrates first-order optimization analysis with empirical NFA research.
Limitations:
FACT does not guarantee that it is superior to NFA in all situations.
Further research is needed on the applicability and generalizability of FACT.
Applicability verification for high-dimensional, complex neural networks is required.
👍