Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning

Created by
  • Haebom

Author

Yujian Zhang, Keyu Chen, Zhifeng Shen, Ruizhi Qiao, Xing Sun

Adaptive Dual Reasoner (ADR)

Outline

To address the computational costs and inference delays caused by excessive thinking, this paper proposes an Adaptive Dual Reasoner (ADR) that supports two inference modes (fast and slow). The ADR dynamically switches between modes depending on contextual complexity during inference. The ADR is trained in two stages: (1) initial training (SFT) using supervised learning, and (2) inference effort optimization using reinforcement learning. In the reinforcement learning stage, we introduce Entropy-guided Hybrid Policy Optimization (EHPO), which uses an entropy-based dynamic rollout strategy to balance fast and slow inference by performing branching in high-entropy units and applying difficulty-aware penalties.

Takeaways, Limitations

Achieving an effective balance between performance and efficiency compared to state-of-the-art approaches in mathematical inference benchmarks.
Up to 6.1% performance improvement and 49.5% to 59.3% reduction in inference output length.
A new approach to solving problems caused by excessive thinking.
Development of a dynamic switching mechanism between two inference modes.
Optimizing inference effort with the EHPO reinforcement learning framework.
Further research is needed to determine the generalization performance of the proposed method and its applicability to other complex inference tasks.
Further analysis is needed on the actual deployment and scalability of the model.
👍