Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Dion: Distributed Orthonormalized Updates

Created by
  • Haebom

Author

Kwangjun Ahn, Byron Xu, Natalie Abreu, Ying Fan, Gagik Magakyan, Pratyusha Sharma, Zheng Zhan, John Langford

Outline

This paper proposes Dion, a novel optimization algorithm that enhances the efficiency of orthogonalized updates in large-scale language model (LLM) training. Existing orthogonalization methods rely on dense matrix operations, making them inefficient for large-scale LLM training with partitioned weights. Dion addresses this issue by replacing Newton-Schultz iterations with depreciated power-law iterations in momentum buffers. It avoids full matrix reconfiguration and seamlessly integrates with weight partitioning, while balancing quality and cost reduction through rank-fraction parameters and error feedback. Experimental results on language models ranging from 160 million to 3 billion parameters demonstrate that Dion achieves significant speedup at large scale while maintaining the advantages of orthogonalized updates.

Takeaways, Limitations

Takeaways:
Significantly improves the efficiency of orthogonalized updates in large-scale LLM learning.
Solving the problem of high computational and communication costs of existing methods.
Scalability achieved through seamless integration with weighted splitting.
The rank-fraction parameter allows you to balance quality and cost.
Providing practical optimization algorithms for next-generation foundational models.
Limitations:
The presented experimental scope is limited to less than 3 billion parameters. Performance evaluation on larger models is required.
Further research may be needed to determine the optimal value for the rank-fraction parameter.
A more in-depth comparative analysis with other optimization algorithms is needed.
👍