Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Stochastic Layer-wise Learning: Scalable and Efficient Alternative to Backpropagation

Created by
  • Haebom

Author

Bojian Yin, Federico Corradi

Outline

To address the limitations of backpropagation, which relies on global gradient synchronization, this paper presents Stochastic Layer-wise Learning (SLL), a layer-wise training algorithm that performs layer-wise coordinated updates while maintaining global representation consistency. Inspired by Evidence Lower Bound (ELBO) and based on Markov assumptions about networks, SLL allows each layer to optimize a local objective via a deterministic encoder. ELBO's intractable KL divergence is replaced by a Bhattacharyya surrogate computed from an auxiliary categorical posterior distribution derived through a random projection that maintains fixed geometry. Optional multiplicative dropout provides probabilistic regularization. SLL eliminates backpropagation between layers by optimizing locally and aligning globally. Experiments on MLPs, CNNs, and Vision Transformers, ranging from MNIST to ImageNet, show that SLL outperforms recent local methods, while its memory footprint rivals that of global BP, regardless of depth.

Takeaways, Limitations

Takeaways:
Presenting the possibility of efficient local learning without global gradient synchronization.
Increase model scalability by improving memory usage and computational efficiency.
It shows similar or better performance than existing methodologies on various datasets such as MNIST and ImageNet.
Suitable for training large-scale models with depth-independent memory usage.
Limitations:
As an ELBO-based methodology, its performance may be affected by the validity of the Markov assumptions.
There may be a potential impact on accuracy due to the use of Bhattacharyya surrogate.
Relies on additional regularization techniques such as random projection and dropout.
👍