Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Scalable LLM Math Reasoning Acceleration with Low-rank Distillation

Created by
  • Haebom

Author

Harry Dong, Bilge Acun, Beidi Chen, Yuejie Chi

Outline

Mathematical inference of large-scale language models (LLMs) requires significant computational resources and time due to their long generation times. Existing efficient inference methods maintain excellent performance on language tasks, but often significantly degrade mathematical performance. In this paper, we propose Caprese, a resource-efficient distillation method for recovering the mathematical power lost by applying efficient inference methods, focusing specifically on the feedforward block. Using only approximately 1% of additional parameters and 20,000 synthetic training samples, without changing the original weights, Caprese significantly recovers the lost mathematical power through efficient inference. Furthermore, Caprese reduces the number of active parameters (approximately 2 billion for Gemma 2.9B and Llama 3.1.8B) and seamlessly integrates into existing model layers, promoting response compactness (up to 8.5% fewer tokens) and reducing latency (by more than 16%, the time to the next token).

Takeaways, Limitations

Takeaways:
A resource-efficient distillation method is presented to effectively recover mathematical abilities lost when applying efficient inference methods in LLM.
Recover performance with a small number of additional parameters and training samples without changing existing weights.
Improved efficiency through reduced number of active parameters and reduced latency
Encourage concise responses
Limitations:
Lack of further information on the extent of recovery of specific mathematical abilities.
Further research is needed on generalization performance to other models and tasks.
Need more details on how Caprese works?
👍