Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

Created by
  • Haebom

Author

Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li

Outline

This paper proposes BudgetThinker, a novel framework for precisely controlling the inference length of a Large Language Model (LLM), enabling efficient inference even in resource-constrained and real-time environments. BudgetThinker periodically inserts special control tokens during inference to continuously inform the model of the remaining token budget. This is combined with a two-stage training pipeline: supervised fine-tuning (SFT) and curriculum-based reinforcement learning (RL) using a length-aware reward function. Experimental results demonstrate that BudgetThinker outperforms existing methods in maintaining performance on challenging mathematical benchmarks across a variety of inference budgets.

Takeaways, Limitations

Takeaways:
Effectively controlling the length of LLM's inference process enables high-performance inference even in resource-constrained environments.
Increases the applicability of LLM to real-time applications.
Simultaneously optimize accuracy and budget compliance with a training pipeline based on SFT and RL.
Consistent performance across a variety of inference budgets.
Limitations:
Further research is needed to determine the generalization performance of the proposed method. (Note that performance on specific mathematical benchmarks is presented only; further validation is needed for performance on other types of problems.)
Further research is needed on the optimization and generalizability of special control token insertion methods.
Detailed explanations of the design and parameter tuning of curriculum-based reinforcement learning may be lacking.
👍