Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Created by
  • Haebom

Author

Jiazheng Li, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Hongzhou Lin, Yi Wu, Jingzhao Zhang

Outline

This paper identifies limitations of traditional reinforcement learning (RL) in enhancing the multi-level inference capability of large-scale language inference models (LLMs) and proposes Question Augmentation (QuestA), a novel approach to address these limitations. QuestA is a simple yet effective strategy that provides partial solutions during RL training, reducing the difficulty of the problem and providing more informative training signals. Applying QuestA during RL training for mathematical inference tasks improves Pass@1 and Pass@k performance, particularly on problems where traditional RL struggles. Applying QuestA to powerful open-source models such as DeepScaleR and OpenMath Nemotron, we achieve state-of-the-art results (67.1%, 59.5%, and 35.5%, respectively) on the AIME24, AIME25, and HMMT25 benchmarks. Furthermore, we provide a theoretical explanation for QuestA's improved sample efficiency, suggesting a practical and generalizable approach for extending inference capability using RL.

Takeaways, Limitations

Takeaways:
A novel approach to improving the multi-level inference capability of RL-based LLM (QuestA).
Improving existing SOTA performance in mathematical reasoning tasks.
Suggesting the possibility of efficient RL learning through improved sample efficiency.
QuestA's simplicity and generalizability suggest its applicability to a variety of reasoning tasks.
Limitations:
The possibility that QuestA's effectiveness is limited to certain types of problems (mathematical reasoning).
Further research is needed on QuestA's generalization performance to other types of reasoning tasks.
Further research is needed on optimization strategies for the quality and timing of partial solutions.
👍