This paper raises the question whether reinforcement learning with verifiable rewards (RLVR) is a useful method for improving the ability to solve complex logical tasks, but whether it actually extends the inference scope of the model or merely amplifies the high-reward outputs already known by the base model for improved precision. This study provides new insights into the potential limitations of RLVR through theoretical and experimental investigations. We present a new theoretical perspective that RLVR is constrained by the support of the base model (it cannot sample solutions with initial probability 0) and can act as a conservative reweighting mechanism to limit the discovery of completely new solutions. We also identify an entropy-reward tradeoff: RLVR improves precision, but may overlook correct but underrepresented solutions by progressively narrowing the search. Extensive experimental results show that RLVR consistently improves pass@1, but the reduction in empirical support is generally larger than the expansion in empirical support under larger sampling budgets, and it fails to recover previously accessible answers from the base model. Interestingly, although RLVR occasionally increases token-level entropy, increasing uncertainty at each generation step, it decreases answer-level entropy, indicating that these seemingly more uncertain paths ultimately converge to a smaller set of unique answers. Collectively, these results demonstrate the potential limitations of RLVR in expanding the inference horizon. Future algorithmic innovations, such as explicit search mechanisms or hybrid strategies that embed probability mass in underrepresented solution regions, may be needed to break these invisible constraints.