Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Shadow-FT: Tuning Instruct Model via Training on Paired Base Model

Created by
  • Haebom

Author

Taiqiang Wu, Runming Yang, Jiayi Li, Pengfei Hu, Yik-Chung Wu, Ngai Wong, Yujiu Yang

Outline

While large-scale language models (LLMs) consistently show performance improvements with additional fine-tuning, directly tuning the Instruct model can result in minimal or even degraded performance. The Base model, which underlies the Instruct model, has very similar weight values, and while the Base model is a good learner, it serves as a weak backbone without subsequent training. Therefore, this study proposes the Shadow-FT framework, which utilizes the corresponding Base model to tune the Instruct model. The core idea is to fine-tune the Base model and directly apply the learned weight updates to the Instruct model. Shadow-FT does not introduce additional parameters, is easy to implement, and significantly improves performance. We conducted extensive experiments on leading LLMs, such as the Qwen 3 and Llama 3 series, and evaluated them on 19 benchmarks, including coding, inference, and mathematical tasks. The experimental results show that Shadow-FT consistently outperforms existing full-parameter and parameter-efficient tuning methods. Furthermore, Shadow-FT can be applied to multimodal LLMs (MLLMs) and combined with Direct Affinity Optimization (DPO).

Takeaways, Limitations

Shadow-FT presents a novel framework that effectively improves the performance of Instruct models.
We use an innovative approach to tailoring the Instruct model by leveraging the Base model.
Easy to implement without additional parameters.
It has shown superior performance over existing methods in various benchmarks, including coding, inference, and mathematical problems.
It offers the possibility of combining with multimodal LLM and DPO.
The paper does not specify any specific Limitations (e.g., limitations on specific models or tasks).
👍