Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Created by
  • Haebom

Author

Wenqiao Zhu, Ji Liu, Rongjuncheng Zhang, Haipang Wu, and Yulun Zhang

Outline

This paper proposes Contrastive learning with annotated CoT-based Reinforced Fine-Tuning (\TheName{}), a novel reinforcement learning-based fine-tuning method for improving the inference ability of large-scale language models (LLMs). To address the problems of unstable inference path sampling and neglect of annotated thought processes (CoTs) in existing RL-based methods, as well as the overemphasis of CoTs in existing SFT approaches, we learn representations for each CoT and design new contrastive signals to guide the fine-tuning process. \TheName{} fully utilizes annotated CoTs while incorporating unsupervised learning signals to stabilize the fine-tuning process. Experimental results using three baseline methods, two base models, and two datasets demonstrate significant advantages of \TheName{} in terms of robustness, performance (up to 10.15% improvement), and efficiency (up to 30.62% improvement).

Takeaways, Limitations

Takeaways:
Presenting an effective new method to improve the inference performance of LLM.
Solving the instability and model collapse problems of existing RL-based methods, which are Limitations.
Effectively utilize annotated CoT to improve performance and increase efficiency.
Implementing a stable and efficient fine-tuning process through contrastive learning.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Further experiments on different LLMs and datasets are needed.
There is a possibility that the performance improvement of \TheName{} may be limited to specific datasets or models.
Potential increase in computational cost due to algorithm complexity.
👍