This paper presents a novel probabilistic framework for inference-time scaling (ITS) to improve the inference performance of large-scale language models (LLMs). It overcomes the limitations of conventional heuristic-based parallel sampling methods and establishes a theoretical foundation for optimal inference time scaling under the assumption that parallel samples are independent and identically distributed. By estimating the probability distribution of a best-of-N selection strategy, we derive a theoretical lower bound on the minimum number of samples required to achieve a target performance level. Based on this lower bound, we develop the OptScale algorithm, which dynamically determines the optimal sample count. OptScale uses a language model-based predictor to estimate probabilistic prior parameters and determines the minimum number of samples that satisfies predefined performance thresholds and confidence levels. Extensive experiments on mathematical inference benchmarks such as MATH-500, GSM8K, AIME, and AMC demonstrate that OptScale significantly reduces sampling overhead while maintaining state-of-the-art inference performance. This paper provides both theoretical foundations and practical solutions, making a significant contribution to the efficient deployment of LLMs for complex inference. The source code is publicly available.