Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Adaptive Rectification Sampling for Test-Time Compute Scaling

Created by
  • Haebom

Author

Zhendong Tan, Xingjun Zhang, Chaoyi Hu, Yancheng Pan, Shaoxun Wang

Outline

OpenAI-o1 and DeepSeek-R1 demonstrated that test-time scaling can significantly improve model performance on complex tasks such as logical reasoning. This paper proposes Adaptive Rectification Sampling (AR-Sampling), which allows self-correction to correct errors at a finer level of granularity. AR-Sampling utilizes a process-supervised reward model (PRM) that acts as a verifier and trigger sentences to adaptively induce the model to rethink at appropriate stages. Experimental results on GSM8K and MATH500 demonstrate that the proposed approach improves solution accuracy by encouraging the model to rethink at a finer level of granularity, while generating a reasonable number of additional tokens.

Takeaways, Limitations

AR-Sampling improves accuracy by encouraging the model to correct errors at a granular level.
Enables adaptive step-by-step inventory by leveraging PRM and trigger statements.
Keep additional token creation at a reasonable level.
Experiments are limited to the GSM8K and MATH500 datasets, and further verification of generalizability to other complex tasks is required.
The performance of PRM and the design of trigger statements can affect the efficiency of the entire system.
👍