Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

Created by
  • Haebom

Author

Lixin Wu, Na Cai, Qiao Cheng, Jiachen Wang, Yitao Duan

Outline

Confucius3-Math is an open-source large-scale language model with 14 billion parameters that runs efficiently on a single consumer-grade GPU and achieves state-of-the-art performance on a variety of mathematical inference tasks. It focuses on mathematics learning for Chinese K-12 students and educators, and excels at solving key Chinese K-12 mathematics problems in line with the Chinese National Curriculum at low cost through post-training with large-scale reinforcement learning (RL). In this paper, we share the development process, the problems encountered, and the techniques for solving them, and introduce three technical innovations: target entropy regularization, recent sample recovery, and policy-specific difficulty weighting. These innovations include a new entropy regularization, a new data scheduling policy, and an improved group relative advantage estimator, which significantly improve the stability of RL training, improve data efficiency, and enhance performance. This work demonstrates that it is possible to build powerful inference models in specific fields at low cost. The model and code are open sourced on GitHub.

Takeaways, Limitations

Takeaways:
Demonstrating the potential for developing high-performance mathematical inference models that run efficiently on consumer-grade GPUs.
Demonstrating the potential of AI in education and knowledge dissemination through a model specialized in K-12 mathematics education in China.
We present novel technical innovations including target entropy regularization, recent sample recovery, and policy-specific difficulty weighting.
We demonstrate the feasibility of building powerful domain-specific inference models at low cost.
Contributing to the development of academia and industry through open source disclosure.
Limitations:
It is specific to the Chinese K-12 curriculum and may not be directly applicable to other education systems.
The model's performance evaluation is limited to a specific dataset, requiring further research on generalizability.
Additional research is needed to improve the stability and efficiency of reinforcement learning-based model training.
👍