This paper addresses the limitations of visual problem solving using image-based tools and reinforcement learning in large-scale multimodal models. Existing open-source approaches are unsuitable for challenging tasks requiring trial-and-error exploration due to their monotonous inference patterns and limited number of interaction turns. To address this, this study presents the Mini-o3 system, which extends tool-based interaction. Mini-o3 performs deep, multi-turn inference across dozens of stages, achieving state-of-the-art performance on challenging visual search tasks. Reproducing OpenAI o3-style behavior involves three key components: First, we construct the Visual Probe Dataset, a collection of thousands of challenging visual search problems designed for exploratory inference. Second, we develop an iterative data collection pipeline to obtain cold-start paths exhibiting diverse inference patterns, including depth-first exploration, trial-and-error, and goal-maintaining. Third, we propose an excessive turn masking strategy that prevents penalties for excessive turn responses (those that reach the maximum number of turns) during reinforcement learning, thereby balancing training-time efficiency and test-time scalability. Despite being trained with an upper limit of just six interaction turns, the model naturally generates paths that scale to tens of turns during inference, and accuracy improves as the number of turns increases. Extensive experiments demonstrate that Mini-o3 effectively solves challenging visual search problems by generating rich inference patterns and deep thought pathways.