This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes Dion, a novel optimization algorithm that enhances the efficiency of orthogonalized updates in large-scale language model (LLM) training. Existing orthogonalization methods rely on dense matrix operations, making them inefficient for large-scale LLM training with partitioned weights. Dion addresses this issue by replacing Newton-Schultz iterations with depreciated power-law iterations in momentum buffers. It avoids full matrix reconfiguration and seamlessly integrates with weight partitioning, while balancing quality and cost reduction through rank-fraction parameters and error feedback. Experimental results on language models ranging from 160 million to 3 billion parameters demonstrate that Dion achieves significant speedup at large scale while maintaining the advantages of orthogonalized updates.
Takeaways, Limitations
•
Takeaways:
◦
Significantly improves the efficiency of orthogonalized updates in large-scale LLM learning.
◦
Solving the problem of high computational and communication costs of existing methods.
◦
Scalability achieved through seamless integration with weighted splitting.
◦
The rank-fraction parameter allows you to balance quality and cost.
◦
Providing practical optimization algorithms for next-generation foundational models.
•
Limitations:
◦
The presented experimental scope is limited to less than 3 billion parameters. Performance evaluation on larger models is required.
◦
Further research may be needed to determine the optimal value for the rank-fraction parameter.
◦
A more in-depth comparative analysis with other optimization algorithms is needed.