Recent advances in LLM have demonstrated that RLVR is a promising approach for solving complex logical problems. This study investigates whether current RLVR methods actually extend the model's inference range or improve accuracy by amplifying the high-reward outputs already known from the base model. This study reveals that, under current training conditions, RLVR can operate as a support-constrained optimization mechanism, constrained by the initial distribution of the base model, which can limit the discovery of entirely novel solutions. Furthermore, by examining the entropy-reward tradeoff, we find that current RLVR methods can improve accuracy while narrowing the search and overlooking underrepresented answers. Experimental results show that RLVR consistently improves pass@1, but under large sampling budgets, the reduction in empirical support generally outweighs the expansion, failing to recover answers previously accessible to the base model. Furthermore, we observe that even as token-level entropy increases, leading to greater uncertainty at each generation step, answer-level entropy decreases, indicating that these uncertain paths ultimately converge to a smaller set of individual answers.