This paper proposes a novel pooling mechanism, the Dual-Resolution Attentive Statistics Pooling (DRASP) framework, for MOS prediction, a speech quality assessment metric. To overcome the limitations of existing pooling methods, which tend to focus on global or frame-by-frame analysis and overlook complementary perceptual insights, DRASP integrates global statistical summaries with fine-grained analyses of key segments. This simultaneously captures both the overall structural context and important local details, yielding more accurate and robust representations. Extensive experiments on diverse datasets (MusicEval, AES-Natural), MOS prediction backbones (CLAP-based models, AudioBox-Aesthetics), and speech generation systems demonstrate the effectiveness and superior generalization performance of DRASP, improving the system-level Spearman correlation coefficient (SRCC) by 10.39% compared to the average pooling method.