This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
Created by
Haebom
Author
Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Mingsong Yan, Zi Yang, Paul Hovland, Bogdan Nicolae, Franck Cappello, Sui Tang, Zheng Zhang
Outline
This paper proposes CoLA and CoLA-M, efficient autoencoders that leverage the low-rank nature of activations to address the problem of full-size MLPs and attention layers consuming excessive computational resources during training large-scale language models (LLMs). CoLA reduces computational costs and improves training throughput, while CoLA-M further reduces memory costs. Experiments with the LLaMA model demonstrate that CoLA reduces computational costs by a factor of 2 and improves training throughput by a factor of 1.86, while CoLA-M improves parameter, computation, and memory efficiency. Consequently, the model size is reduced by a factor of 2, enabling faster inference on resource-constrained platforms.
Takeaways, Limitations
•
Takeaways:
◦
We propose CoLA and CoLA-M, novel architectures that improve the computational and memory efficiency of LLM training.
◦
Experiments with the LLaMA model demonstrate the performance and efficiency of CoLA (2x reduction in computational cost, 1.86x improvement in training throughput).
◦
Further improvements to memory efficiency through CoLA-M.
◦
Reduce inference speed and memory usage by reducing model size.
•
Limitations:
◦
No specific experimental data (e.g., experimental results for different model sizes and datasets) are presented.
◦
Further research is needed on the generalization performance of CoLA and CoLA-M.
◦
Lack of comparative analysis with other techniques that utilize low-rank characteristics.