Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ThinkTuning: Instilling Cognitive Reflections without Distillation

Created by
  • Haebom

Author

Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou

Outline

Building on previous research demonstrating that reinforcement learning (RL) alone cannot create large-scale language models (LLMs) with reasoning capabilities, this paper proposes ThinkTuning, a novel method for training models lacking reasoning capabilities. ThinkTuning is a GRPO-based interactive learning approach that enhances the rollout of a student model guided by a teacher model. The teacher model presents problems and provides corrective feedback on the student model's answers, thereby improving the student model's reasoning ability. Experimental results show that ThinkTuning improves performance by an average of 3.85% over the zero-shot baseline on various benchmarks, and by 2.08%, 2.23%, and 3.99% on MATH-500, AIME, and GPQA-Diamond, respectively. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
Suggesting the possibility of improving LLM reasoning ability through an interactive learning method based on teacher-student model interaction.
Combining feedback from GRPO and teacher models to suggest an effective way to learn thinking skills.
Experimentally demonstrating that thinking ability can be improved even in models with limited thinking ability.
Demonstrated performance improvements in various benchmarks, demonstrating practical effectiveness.
Limitations:
There is a possibility that learning performance can be greatly affected by the quality of the teacher model.
There is a possibility that the effectiveness of the proposed method may be limited to certain types of problems or models.
Performance evaluation for more diverse and complex problems is needed.
Consideration must be given to computational cost and training time.
👍