This paper proposes Collective Test-Time Scaling (CTTS) to overcome the limitations of Test-Time Scaling (TTS), a training-free approach for improving the performance of large-scale language models (LLMs). CTTS aims to improve performance by collaborating with multiple agents and multiple reward models, moving beyond the traditional single-test-time scaling (STTS) paradigm. To achieve this, we systematically study three interaction paradigms—SA-MR, MA-SR, and MA-MR—and demonstrate the superiority of the MA-MR paradigm. We then propose a novel framework, CTTS-MM, which maximizes LLM performance through Agent Collaboration Search (ACS) for agent collaboration and Mixture of Reward Models (MoR) for reward model collaboration. CTTS-MM outperforms existing STTS methods and state-of-the-art LLMs, such as GPT-4.1, on various benchmarks.