[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Towards the Next Frontier in Speech Representation Learning Using Disentanglement

Created by
  • Haebom

Author

Varun Krishna, Sriram Ganapathy

Outline

In this paper, we propose Learn2Diss, a novel framework for self-supervised learning of speech representations. Unlike conventional frame-wise mask prediction methods, Learn2Diss learns both frame-level features and utterance-level features (speakers, channel features, etc.) of speech by combining a frame-wise encoder and an utterance-wise encoder. The frame-wise encoder learns pseudophoneme representations based on conventional self-supervised learning techniques, and the utterance-wise encoder learns pseudospeaker representations based on contrastive learning. The two encoders are trained separately using a mutual information-based criterion. Through various sub-task evaluation experiments, we demonstrate that the frame-wise encoder improves the performance of semantic tasks, while the utterance-wise encoder improves the performance of non-semantic tasks. As a result, Learn2Diss achieves state-of-the-art performance on various tasks.

Takeaways, Limitations

Takeaways:
We improved the performance of speech representation learning by simultaneously considering frame-level and utterance-level information.
Achieves state-of-the-art performance on both semantic and non-semantic tasks.
We show that encoder separability learning using mutual information-based criteria is effective.
Limitations:
A detailed analysis of the optimization process based on mutual information criteria may be lacking.
Further research is needed on generalization performance on diverse speech datasets.
The performance gains for certain subtasks may be relatively smaller than for other tasks.
👍