Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Compass-Thinker-7B Technical Report

Created by
  • Haebom

Author

Anxiang Zeng, Haibo Zhang, Kaixiang Mo, Long Zhang, Shuman Liu, Yanhui Huang, Yawen Liu, Yuepeng Sheng, Yuwei Huang

Outline

Compass-Thinker-7B is a 7 billion-parameter language model that enhances mathematical reasoning capabilities through reinforcement learning. To address the high cost and resource constraints associated with applying reinforcement learning to existing large-scale language models, it was trained using an efficient reinforcement learning pipeline and a dataset of 30,000 verifiable mathematical problems. Step-by-step difficulty adjustments gradually unlock the model's potential and improve training efficiency. Notably, it achieves 40% accuracy in the AIME2024 evaluation, demonstrating superior mathematical reasoning performance compared to other reinforcement learning models of the same scale.

Takeaways, Limitations

Takeaways:
We demonstrate that efficient reinforcement learning can achieve superior inference capabilities even on relatively small-scale models, rather than large-scale models.
We suggest that a reinforcement learning strategy using step-by-step difficulty adjustment can effectively bring out the model's potential.
It suggests that the development of high-performance inference models is possible even with limited resources, and suggests directions for future research on reinforcement learning for large-scale models.
Limitations:
The performance evaluation of the Compass-Thinker-7B model was primarily limited to mathematical problems. Further research is needed to evaluate its performance on other types of reasoning problems.
The size of the dataset used (30,000 data points) is relatively small compared to datasets used for large-scale model training. Research utilizing larger datasets may be necessary.
Further research is needed to determine the generalizability of the proposed reinforcement learning pipeline. Its applicability to other types of problems and models needs to be verified.
👍