Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

Created by
  • Haebom

Author

Zhihao Zhang, Qiaole Dong, Qi Zhang, Jun Zhao, Enyu Zhou, Zhiheng Xi, Senjie Jin, Xiaoran Fan, Yuhao Zhou, Mingqi Wu, Yanwei Fu, Tao Ji, Tao Gui, Xuanjing Huang, Kai Chen

Outline

This paper studies the impact of postprocessing algorithms, such as Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT), on prior knowledge in a large-scale multimodal language model. Specifically, we introduce a new task, a jigsaw puzzle, not found in existing pretraining data, and apply SFT and RFT to the open-source multimodal model Qwen2.5-VL series. Experimental results show that SFT enables rapid task acquisition but rapidly forgets prior knowledge, whereas RFT learns more slowly but retains prior knowledge. Analyzing this phenomenon through learning dynamics reveals that RFT reduces the interference with prior knowledge by reinforcing samples that naturally fit the model's probability distribution. Furthermore, we demonstrate through RFT simulations that SFT can learn new tasks more quickly while better preserving prior knowledge.

Takeaways, Limitations

Although SFT is effective for rapid task acquisition, it causes forgetting of prior knowledge.
RFT learns slowly but retains prior knowledge better.
The distribution of training data plays an important role in the forgetting phenomenon.
RFT simulation rollout can enhance SFT's ability to preserve prior knowledge.
This study is limited to the Qwen2.5-VL model series, and further research is needed to determine its generalizability to other models.
The computational cost of RFT may be higher than that of SFT.
Since only results for specific tasks such as jigsaw puzzles are presented, generalization to other tasks needs to be verified through additional experiments.
👍