Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Mitigating Strategy-Selection Bias in Reasoning for More Effective Test-Time Scaling

Created by
  • Haebom

Author

Zongqian Wu, Baoduo Xu, Tianyu Li, Zhu Sun, Xiaofeng Zhu, Lei Feng

Outline

This paper addresses the problem of inference strategy selection bias in test-time scaling (TTS), which improves the performance of large-scale language models (LLMs). Existing TTS methods improve performance by sampling and aggregating diverse inference paths. However, we highlight the problem that LLMs lack solution space exploration, favoring specific inference strategies (e.g., algebraic solutions to mathematical problems) and overlooking other valid alternatives (e.g., geometric solutions). To address this issue, we present a theoretical analysis that identifies the point at which this selection bias hinders the effectiveness of TTS and propose the TTS-Uniform framework to mitigate inference strategy selection bias. TTS-Uniform (i) identifies potential strategies, (ii) evenly allocates the sampling budget, and (iii) filters out unstable strategies before aggregation. Experimental results demonstrate that TTS-Uniform significantly improves scaling on several popular LLMs and benchmark datasets.

Takeaways, Limitations

Takeaways:
The problem of inference strategy selection bias in TTS of LLM was first identified and theoretically analyzed.
We propose a TTS-Uniform framework that alleviates selection bias problems and experimentally verify its effectiveness.
We demonstrate the superiority of TTS-Uniform on various LLM and benchmark datasets.
Limitations:
Further research is needed to determine whether the performance improvements of TTS-Uniform can be generalized to all types of problems and LLM.
The process of identifying potential strategies and filtering out unstable strategies can be computationally expensive.
Further research is needed on how to automatically find the optimal inference strategy for a given problem.
👍