This paper presents a method utilizing Test-Time Reinforcement Learning (TTRL) to improve the complex inference capability of large-scale language models (LLMs). To address the high inference cost and overconfidence issues of existing TTRL, we propose two strategies: Entropy Branch-Tree Majority Rollout (ETMR) and Entropy-Based Advantage Reconfiguration (EAR), which improve the exploration-exploitation balance by introducing entropy-based mechanisms. Applying this strategy to the Llama3.1-8B model, we demonstrate an efficient approach that improves the Pass at 1 metric by 68% on the AIME 2024 benchmark while using only 60% of the rollout token budget. This demonstrates that TTRL effectively optimizes the balance between inference efficiency, diversity, and estimation robustness.