This paper proposes the LASER framework to address the problem of effective image region inference for Vision Language Models (VLMs), a key challenge in GUI grounding tasks under high-resolution inputs and complex multi-element visual interactions. LASER integrates Monte Carlo quality estimation and IoU-based region quality assessment to progressively empower VLMs with multi-level perceptual capabilities that improve both accuracy and diversity, enabling accurate coordinate prediction. This allows the model to focus on key regions relevant to instructions and adaptively allocate inference steps based on task complexity. Experimental results on the ScreenSpot Pro and ScreenSpot-v2 benchmarks demonstrate the effectiveness of LASER, demonstrating its performance among 7B-scale models. Specifically, LASER, fine-tuned on GTA1-7B, achieved a score of 55.7 on the ScreenSpot-Pro benchmark.