Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework

Created by
  • Haebom

Author

Yunsik Kim, Yoonyoung Chung

Outline

This paper proposes a multimodal framework that combines body-conducted microphone signals (BMS) and acoustic microphone signals (AMS). BMS is robust to noise but suffers from loss of high-frequency information, while AMS is rich in high-frequency information but susceptible to noise. This study addresses these shortcomings by using two networks: a mapping-based model that enhances BMS and a masking-based model that removes noise from AMS. The two models are integrated through a dynamic fusion mechanism that adapts to local noise conditions, optimally leveraging the strengths of each modality. Evaluation using objective speech quality metrics, including DNS-2023 noise clips added to the TAPS dataset, demonstrates superior performance compared to single-modal approaches in various noise environments.

Takeaways, Limitations

Takeaways:
We present a novel multimodal framework that combines the advantages of body-conducting and acoustic microphones to simultaneously achieve noise cancellation and high-frequency information restoration.
Effective utilization of mapping-based and masking-based networks achieves improved performance compared to conventional simple feature combining methods.
Increased adaptability to various noise environments through dynamic fusion mechanisms.
The superiority of the proposed method is verified through objective voice quality evaluation results.
Limitations:
Further validation of the generalization performance of the dataset used (TAPS + DNS-2023) is needed.
Performance evaluation in real-world environments and robustness evaluation against additional noise types are needed.
Consideration needs to be given to the model's complexity and computational cost.
👍