This paper addresses the emergence of cooperation in multi-agent systems as a statistical physics problem, studying how microscopic learning rules induce macroscopic collective behavioral changes. Building on mechanisms proposed in previous studies, we propose a Q-learning-based variant of adaptive rewiring. This method combines temporal difference learning with network reconfiguration, allowing agents to optimize their strategies and social connections based on their interaction history. Neighbor-specific Q-learning allows agents to develop sophisticated partnership management strategies, enabling the formation of cooperative clusters and creating spatial separation between cooperative and faulty regions. Using a power-law network reflecting real-world heterogeneous connectivity patterns, we evaluate emerging behaviors under various rewiring constraints, demonstrating distinct cooperative patterns across parameter space rather than abrupt thermodynamic transitions. Through systematic analysis, we identify three behavioral regimes: a permissive regime (low constraints), an intermediate regime (sensitively dependent on dilemma intensity), and a patient regime (high constraints). Simulation results demonstrate that while appropriate constraints create transitional regions that inhibit cooperation, fully adaptive rewiring systematically explores favorable network configurations, enhancing cooperation. Quantitative analysis demonstrates that increasing the rewiring frequency leads to the formation of large clusters with a power-law size distribution. These findings present a new paradigm for understanding intelligence-driven cooperative pattern formation in complex adaptive systems, demonstrating how machine learning can serve as an alternative driving force for spontaneous organization in multi-agent networks.