Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

Created by
  • Haebom

Author

Simon Dahl Jepsen, Mads Gr{\ae}sb{\o}ll Christensen, Jesper Rindom Jensen

Outline

This paper investigates the impact of using Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) as an evaluation metric and learning objective in supervised speech separation when using noisy training reference data, such as WSJ0-2Mix. Deriving SI-SDR using noisy reference data reveals that noise limits the achievable SI-SDR or introduces unwanted noise in the separated output. To address this, we propose a method to enhance the reference data using WHAM! and augment mixed data to train models that avoid learning from noisy reference data. Two models trained on the enhanced dataset are evaluated using the non-invasive NISQA.v2 metric. The results demonstrate noise reduction in separated speech, but suggest that artifacts introduced during reference data processing may limit overall quality improvement. A negative correlation between SI-SDR and perceived noise was found on the WSJ0-2Mix and Libri2Mix test sets, supporting the derivation results.

Takeaways, Limitations

Takeaways: We present the challenges encountered when using SI-SDR as a learning objective using noisy reference data, and the effectiveness of reference data enhancement and data augmentation techniques to address these challenges. We experimentally confirmed a negative correlation between SI-SDR and perceived noise.
Limitations: Artifacts may be introduced during reference data processing, limiting overall sound quality improvement. Further research is needed to determine whether the proposed method is effective for all types of noise.
👍