This paper presents Input-Time Scaling, a novel scaling paradigm that complements existing data and training scaling and inference time scaling approaches for large-scale language models (LLMs). We explore how to leverage LLM meta-knowledge to improve inputs using various strategies during training and testing. Specifically, we discover a phenomenon known as "train-test co-design," where query strategies must be integrated across training and testing to achieve optimal performance. Interestingly, we find that datasets perceived as low-quality sometimes perform better, and that optimal performance can be achieved with as few as 1,000 randomly selected examples. This finding contradicts the common assumption of "garbage in, garbage out." Training with more data does not always improve performance, suggesting a need to reexamine existing intuitions about data size scaling. In experiments using the Qwen2.5-32B-Instruct model, we achieved state-of-the-art performance (76.7% pass@1) on AIME24 and AIME25, and achieved 80% performance on AIME25 using a three-model majority vote. Based on DeepSeek-R1-Distill-Qwen-32B, we achieved 90.0% performance on AIME24 and 80.0% performance on AIME25. To improve reproducibility and support further research, we plan to open-source the dataset, data pipeline, evaluation results, and checkpoints.