This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Created by
Haebom
Author
Jean Ponce (ENS-PSL, NYU), Basile Terver (FAIR, WILLOW), Martial Hebert (CMU), Michael Arbel (Thoth)
Outline
This paper analyzes the "stop gradient" and "exponential moving average" iterative procedures used to prevent representation collapse in non-contrast approaches to self-supervised learning from an optimization and dynamical systems perspective. We show that these procedures prevent representation collapse even without optimizing the original objective function or another smooth function. Using a dynamical systems perspective in a linear context, we prove that minimizing the original objective function without "stop gradient" or "exponential moving average" always leads to representation collapse. Furthermore, we explicitly characterize the equilibrium points of the dynamical systems associated with these two procedures in the linear setting and show that they are generally asymptotically stable.
Takeaways, Limitations
•
We theoretically show that "stop gradient" and "exponential moving average" do not actually optimize the original objective function, but are effective in preventing representation collapse.
•
We provide a rigorous mathematical proof of the effects of "stop gradient" and "exponential moving average" in a linear environment.
•
Understand the stability of a dynamic system by analyzing its equilibrium points in relation to "stop gradient" and "exponential moving average."
•
This study focuses on theoretical analysis, and further research is needed on generalization to other types of self-supervised learning techniques or nonlinear environments.
•
The theoretical results are verified through experiments using real and synthetic data.