Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Pretrained Conformers for Audio Fingerprinting and Retrieval

Created by
  • Haebom

Author

Kemal Altwlkany, Elmedin Selmanovic, Sead Delalic

Outline

This paper presents a method for training a Conformer-based encoder that generates unique embeddings for small audio segments using a self-supervised contrastive learning framework. By leveraging Conformer's ability to capture local and global interactions, we achieve state-of-the-art performance on audio retrieval tasks, generating embeddings from only 3 seconds of audio. Furthermore, we maintain this state-of-the-art performance while remaining virtually immune to temporal misalignment and other audio artifacts, such as noise, reverberation, and extreme time stretching. We train and test our model on publicly available datasets of various sizes, and we also make the code and model publicly available to ensure reproducibility of our results.

Takeaways, Limitations

Takeaways:
Effective embeddings can be created with just 3 seconds of audio.
Robust to time alignment errors and various audio distortions
Achieving state-of-the-art performance in audio search tasks
Ensuring reproducibility of results through open code and model disclosure
Limitations:
No explicit reference to specific Limitations is provided in the abstract. Further analysis is required.
👍