Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

GradMetaNet: An Equivariant Architecture for Learning on Gradients

Created by
  • Haebom

Author

Yoav Gelberg (Moe), Yam Eitan (Moe), Aviv Navon (Moe), Aviv Shamsian (Moe), Theo (Moe), Putterman, Michael Bronstein, Haggai Maron

GradMetaNet: An Architecture for Gradient Learning

Outline

This paper presents a novel approach to designing an architecture that directly processes gradients, highlighting the importance of gradients in neural network model optimization, editing, and analysis. Specifically, we introduce GradMetaNet, based on three principles: (1) an equilateral design that preserves neuron permutation symmetry, (2) processing gradients from multiple data points to capture curvature information, and (3) efficient gradient representation via rank-1 decomposition. GradMetaNet consists of simple equilateral blocks, and we demonstrate generalizability and demonstrate that existing approaches cannot approximate the natural gradient-based functions that GradMetaNet can perform. Furthermore, we demonstrate that GradMetaNet is effective in various gradient-based tasks for MLPs and Transformers, including learned optimization, INR editing, and loss landscape curvature estimation.

Takeaways, Limitations

Takeaways:
A new principles-based approach to designing gradient processing architectures is presented.
Presentation of theoretical guarantees (generality) of GradMetaNet.
Demonstrating the effectiveness of GradMetaNet on various gradient-based tasks (learned optimization, INR editing, and loss landscape curvature estimation).
Pointing out the limitations of the existing slope processing method.
Limitations:
Lack of detailed information on specific implementation and experimental results (not available from the paper summary alone).
Lack of information about the complexity and computational cost of the architecture.
Lack of comparative analysis information with other modern architectures.
Lack of information about performance limits for specific datasets or tasks.
👍