Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction

Created by
  • Haebom

Author

Cheng-Yeh Yang, Kuan-Tang Huang, Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen

Outline

This paper proposes a novel pooling mechanism, the Dual-Resolution Attentive Statistics Pooling (DRASP) framework, for MOS prediction, a speech quality assessment metric. To overcome the limitations of existing pooling methods, which tend to focus on global or frame-by-frame analysis and overlook complementary perceptual insights, DRASP integrates global statistical summaries with fine-grained analyses of key segments. This simultaneously captures both the overall structural context and important local details, yielding more accurate and robust representations. Extensive experiments on diverse datasets (MusicEval, AES-Natural), MOS prediction backbones (CLAP-based models, AudioBox-Aesthetics), and speech generation systems demonstrate the effectiveness and superior generalization performance of DRASP, improving the system-level Spearman correlation coefficient (SRCC) by 10.39% compared to the average pooling method.

Takeaways, Limitations

Takeaways:
A novel pooling mechanism, DRASP, is proposed to effectively handle variable-length audio features.
Improving MOS prediction performance by simultaneously considering global and local information.
Demonstrated excellent performance and generalization across diverse datasets and models.
Significant performance improvement compared to average pooling (SRCC improvement of 10.39%)
Limitations:
Lack of analysis of DRASP's computational complexity and efficiency.
Further validation of generalization performance across various audio quality degradation types is needed.
Lack of detailed description of DRASP's parameter optimization strategy.
👍