To address the challenges of developing generalizable stereo matching models, which often result in performance degradation due to changes in resolution and disparity range, we propose {S\textsuperscript{2}M\textsuperscript{2}}, which overcomes the limitations of local search methods and the high computational cost of global matching architectures. This model achieves state-of-the-art accuracy and efficiency without the need for cost volume filtering or deep refinement stacks, enhances long-range correspondence through multi-resolution transformers, and utilizes a novel loss function that focuses probabilities on valid matches to provide more robust disparity, occlusion, and confidence estimates. As a result, it outperforms existing models on the Middlebury v3 and ETH3D benchmarks.