This paper addresses a central challenge in deep learning: understanding how neural networks learn representations. A key approach is the Neural Feature Hypothesis (NFA) (Radhakrishnan et al. 2024), a conjecture about the mechanism by which feature learning occurs. While empirically validated, NFAs lack a theoretical basis, making it unclear when they might fail and how to improve them. This paper uses a first-principles approach to understand why this observation holds true and when it does not. Using first-order optimization criteria, we derive Feature-At-Convergence (FACT), an alternative to NFA. FACT (a) achieves greater agreement with learned features at convergence, (b) explains why NFAs hold in most settings, and (c) captures essential feature learning phenomena in neural networks, such as the groking behavior of modular arithmetic and phase transitions in sparse parity learning, similar to NFAs. Therefore, the results of this study integrate the theoretical first-order optimization analysis of neural networks with the empirically driven NFA literature, and provide a principled alternative that is verifiable and empirically valid at convergence.