Large-Scale Vision-Language Models (LVLMs) have shown significant progress in visual understanding tasks, but suffer from performance degradation in visual reasoning tasks due to their prioritization of language knowledge over image information. To address this issue, we identified the shortcomings of existing solutions (limited multimodal inference capabilities and insufficient and irrelevant visual explanations). We then introduced a novel visual inference framework, ProReason, by dividing the visual inference process into two stages: active visual recognition (Vision) and textual inference (Wisdom). This framework features decoupled visual-inference capabilities and multi-step active recognition. ProReason iterates through active information gathering and inference until it can answer a given multimodal question with sufficient visual explanations. Specifically, this separation of capabilities enables seamless integration with existing Large-Scale Language Models (LLMs), thereby addressing the inference deficiencies of LVLMs. Extensive experiments have shown that ProReason outperforms existing multi-stage inference frameworks across a variety of benchmarks, achieving an average performance improvement of 13.2%. Furthermore, through LLM integration, ProReason generates high-quality visual reasoning data, enabling ProReason-distillation models (ProReason-VL and ProReason-Q3) to achieve superior performance in downstream tasks. Insights into existing solutions and a separate perspective for feasible LLM integration will contribute to future research on visual reasoning technologies, particularly those supporting LLM.