This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
Created by
Haebom
Author
Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li
Outline
This paper proposes BudgetThinker, a novel framework for precisely controlling the inference length of a Large Language Model (LLM), enabling efficient inference even in resource-constrained and real-time environments. BudgetThinker periodically inserts special control tokens during inference to continuously inform the model of the remaining token budget. This is combined with a two-stage training pipeline: supervised fine-tuning (SFT) and curriculum-based reinforcement learning (RL) using a length-aware reward function. Experimental results demonstrate that BudgetThinker outperforms existing methods in maintaining performance on challenging mathematical benchmarks across a variety of inference budgets.
Takeaways, Limitations
•
Takeaways:
◦
Effectively controlling the length of LLM's inference process enables high-performance inference even in resource-constrained environments.
◦
Increases the applicability of LLM to real-time applications.
◦
Simultaneously optimize accuracy and budget compliance with a training pipeline based on SFT and RL.
◦
Consistent performance across a variety of inference budgets.
•
Limitations:
◦
Further research is needed to determine the generalization performance of the proposed method. (Note that performance on specific mathematical benchmarks is presented only; further validation is needed for performance on other types of problems.)
◦
Further research is needed on the optimization and generalizability of special control token insertion methods.
◦
Detailed explanations of the design and parameter tuning of curriculum-based reinforcement learning may be lacking.