Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Input-Time Scaling

Created by
  • Haebom

Author

Rapheal Huang (Yuming), Weilong Guo

Outline

This paper presents Input-Time Scaling, a novel scaling paradigm that complements existing data and training scaling and inference time scaling approaches for large-scale language models (LLMs). We explore how to leverage LLM meta-knowledge to improve inputs using various strategies during training and testing. Specifically, we discover a phenomenon known as "train-test co-design," where query strategies must be integrated across training and testing to achieve optimal performance. Interestingly, we find that datasets perceived as low-quality sometimes perform better, and that optimal performance can be achieved with as few as 1,000 randomly selected examples. This finding contradicts the common assumption of "garbage in, garbage out." Training with more data does not always improve performance, suggesting a need to reexamine existing intuitions about data size scaling. In experiments using the Qwen2.5-32B-Instruct model, we achieved state-of-the-art performance (76.7% pass@1) on AIME24 and AIME25, and achieved 80% performance on AIME25 using a three-model majority vote. Based on DeepSeek-R1-Distill-Qwen-32B, we achieved 90.0% performance on AIME24 and 80.0% performance on AIME25. To improve reproducibility and support further research, we plan to open-source the dataset, data pipeline, evaluation results, and checkpoints.

Takeaways, Limitations

Takeaways:
A new LLM scaling paradigm called Input-Time Scaling (ITS) is proposed.
Discovering the train-test co-design phenomenon
Producing results that contradict conventional intuition about data quality (good performance on low-quality datasets)
Compatibility with the 'Less is More' phenomenon (high performance can be achieved even with small amounts of data)
Achieving cutting-edge performance in AIME24 and AIME25
Supporting research reproducibility and further research through the disclosure of datasets, code, and results.
Limitations:
Open source release not yet complete
Further research is needed to determine the generalizability of input time extension strategies.
Lack of theoretical explanation for the 'train-test co-design' phenomenon.
Further experiments are needed on various LLMs and datasets.
👍