This paper presents a method that utilizes test-time reinforcement learning (TTRL) to improve the complex inference capability of large-scale language models (LLMs). To address the high inference cost and early-stage estimation bias of existing TTRL methods, we propose two strategies: Entropy-fork Tree Majority Rollout (ETMR) and Entropy-based Advantage Reshaping (EAR), which introduce entropy-based mechanisms to improve the exploration-exploitation balance. Experimental results on the AIME 2024 benchmark using the Llama3.1-8B model demonstrate that the proposed method achieves a 68% relative performance improvement in the Pass at 1 metric compared to the baseline model, while reducing inference token usage by 60%. This demonstrates that the proposed method effectively optimizes the trade-off between inference efficiency, diversity, and estimation robustness, advancing unsupervised reinforcement learning for open-ended inference tasks.