Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

Created by
  • Haebom

Author

Yuval Weiss, David Demitri Africa, Paula Buttery, Richard Diehl Martinez

Outline

Parameter-efficient methods like LoRA have revolutionized fine-tuning large language models (LLMs). ReLoRA extends this idea to pretraining, repeatedly merging and reinitializing low-rank adapters to increase the cumulative rank while maintaining update costs. This approach aligns well with the observation that high-capacity models learn through local low-rank trajectories that expand over time. In contrast, recent research has shown that small language models (SLMs) exhibit rank deficiencies and underutilize available dimensions. This raises the question of whether ReLoRA's rank expansion update rule can alleviate the rank bottleneck by "guiding" SLMs toward healthier learning dynamics in capacity-constrained environments. This study argues that SLMs are an ideal testbed because they train quickly, enable controlled ablation, and make rank phenomena more measurable. We systematically study ReLoRA on SLMs with 11M-66M parameters for the first time, evaluating both performance and learning dynamics. We found that ReLoRA underperforms full-rank training using loss, Paloma confusion, and BLiMP, and the difference increases with increasing scale. Analysis of proportional effective rank and condition number revealed that ReLoRA amplifies existing rank deficiencies and induces ill-conditioned updates early in training. These results suggest that ReLoRA's merge-and-restart strategy can scale ranks for larger models, but is not directly applicable to capacity-constrained SLMs, suggesting adaptive rank or hybrid rank approaches for low-computation pretraining.

Takeaways, Limitations

ReLoRA underperforms full-rank training on SLM.
ReLoRA amplifies existing rank flaws in SLM and causes ill-conditioned updates.
ReLoRA's merge and restart strategy is not directly applicable to capacity-constrained SLMs.
ReLoRA research in SLM motivates adaptive rank or hybrid rank approaches.
👍