Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Value-Guided Search for Efficient Chain-of-Thought Reasoning

Created by
  • Haebom

Author

Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kiant e Brantley, Wen Sun

Outline

This paper proposes a simple and efficient method for training a value model for long contextual inference traces. Unlike the existing Process Reward Model (PRM), our method does not require a fine-grained concept of "steps," which are difficult to define in long contextual inference models. We trained a value model at the 1.5 billion token level using a dataset of 2.5 million inference traces and applied it to the DeepSeek model to improve performance through test-time computational scaling. We found that using Block-wise Value-Guided Search (VGS) with final weighted majority voting outperforms standard methods such as simple majority voting or best-of-n in terms of test-time scaling. Furthermore, VGS significantly reduces the inference FLOPs required to achieve the same performance as majority voting. The dataset, model, and codebase are publicly available.

Takeaways, Limitations

Takeaways:
We present an efficient method for training value models for long-term context inference models.
Training is possible without fine-grained "step" definitions.
Improved test-time calculation scaling performance with block-wise Value-Guided Search (VGS).
Reduced inference FLOPs required to achieve the same performance as majority voting.
Dataset, model, and codebase made public.
Limitations:
There is no specific mention of Limitations in the paper.
👍