This paper presents a novel framework based on dynamic reinforcement learning to address the fixed inference tree and over-estimation of all possible solution strategies in the existing Probabilistic Tree-of-Thought (ProbTree) framework. The framework incrementally builds an inference tree based on real-time confidence estimation and learns optimal policies for action selection, such as decomposition, search, or aggregation. It improves both solution quality and computational efficiency through selective expansion and intensive resource allocation while maintaining the probabilistic strictness of ProbTree. As a result, we present a new tree-based inference paradigm that balances the reliability of probabilistic frameworks with the flexibility required for practical question-answering systems.