Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Created by
  • Haebom

Author

Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Chenhao Zhu, Xinzhe Juan, Ling Yang, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang

Outline

This paper proposes a method to improve the performance of large-scale language models through inference-time alignment. While conventional Best-of-N (BoN) sampling incurs high computational costs, the proposed TreeBoN integrates a predictive tree search strategy to reduce computational costs while maintaining high output quality. TreeBoN utilizes token-level rewards derived from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. Evaluation results using the AlpacaFarm, HH-RLHF, UltraFeedback, GSM8K, and TutorEval datasets demonstrate that TreeBoN outperforms conventional BoN, achieving a 65% win rate on the TutorEval dataset.

Takeaways, Limitations

Takeaways:
We present TreeBoN, an efficient new framework for inference-time sorting.
Maintains high output quality while reducing computational costs compared to conventional BoN.
It performs well on various datasets, achieving a high win rate of 65% in TutorEval.
Effectively guide tree traversal using DPO.
Limitations:
TreeBoN's performance improvements may be limited to specific datasets and models. Experiments with a wider range of models and datasets are needed.
Since some parts depend on DPO, the performance of TreeBoN may be affected by the quality of DPO.
Due to the complexity of tree search strategies, computational costs may still be high in certain situations. Further research is needed to determine optimal tree search parameters.
👍