Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression

Created by
  • Haebom

Author

Muchammad Daniyal Kautsar, Afra Majida Hariono, Widyawan, Syukron Abu Ishaq Alfarozi, Kuntpong Woraratpanya

Outline

This paper proposes Corrective Adaptive Low-Rank Decomposition (CALR), a novel method that improves on the low-rank decomposition technique using singular value decomposition (SVD) to address the challenges of deploying large-scale language models (LLMs), particularly their massive size and high computational demands. While existing SVD-based compression methods focus on minimizing model reconstruction errors, which degrades functional performance, CALR addresses this issue by combining layers compressed using SVD with parallel low-rank correction modules trained to recover functional residual errors. Experimental results on models such as SmolLM2-135M, Qwen3-0.6B, and Llama-3.2-1B demonstrate that CALR reduces the number of parameters by 26.93% and 51.77%, respectively, while maintaining 59.45% and 90.42% of the original model performance, respectively, outperforming existing methods such as LaCo, ShortGPT, and LoSparse. This demonstrates that treating functional information loss as a learnable signal is an effective compression paradigm.

Takeaways, Limitations

Takeaways:
A novel compression paradigm that treats functional information loss as a learnable signal is presented.
Development of a CALR algorithm that outperforms existing low-rank decomposition techniques.
Increasing real-world deployment potential by reducing the size and computational demands of LLM.
Expanding the Use of LLMs in Resource-Constrained Environments
Limitations:
The experimental results presented are limited to a specific model and require further research on generalizability.
Analysis of the computational cost and time required to train the CALR correction module is needed.
Additional experiments and performance evaluations are needed for LLMs of various sizes and types.
👍