Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

CTTS: Collective Test-Time Scaling

Created by
  • Haebom

Author

Zhende Song, Shengji Tang, Peng Ye, Jiayuan Fan, Lei Bai, Tao Chen, Wanli Ouyang

Outline

This paper proposes Collective Test-Time Scaling (CTTS) to overcome the limitations of Test-Time Scaling (TTS), a training-free approach for improving the performance of large-scale language models (LLMs). CTTS aims to improve performance by collaborating with multiple agents and multiple reward models, moving beyond the traditional single-test-time scaling (STTS) paradigm. To achieve this, we systematically study three interaction paradigms—SA-MR, MA-SR, and MA-MR—and demonstrate the superiority of the MA-MR paradigm. We then propose a novel framework, CTTS-MM, which maximizes LLM performance through Agent Collaboration Search (ACS) for agent collaboration and Mixture of Reward Models (MoR) for reward model collaboration. CTTS-MM outperforms existing STTS methods and state-of-the-art LLMs, such as GPT-4.1, on various benchmarks.

Takeaways, Limitations

Takeaways:
Collective Test-Time Scaling (CTTS) presents a novel approach to improving LLM performance, and is particularly useful because it can improve performance without training.
By collaborating with multi-agent and multi-reward models, we overcome the limitations of existing STTS methods and achieve results that surpass the performance of state-of-the-art LLMs.
Agent Collaboration Search (ACS) and Mixture of Reward Models (MoR) strategies are the core technical contributions of CTTS-MM, which increase its practical applicability.
Supports reproducibility and expandability of research through open source code disclosure.
Limitations:
Although the specific Limitations is not directly mentioned in the paper, further research may be needed on the computational cost of CTTS-MM, optimization of ACS and MoR strategies, and generalizability in various LLM and benchmark environments.
The specific parameter settings and experimental environment presented in the paper may be insufficient, and performance verification in other environments is required.
👍