Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification

Created by
  • Haebom

Author

Bin Wu, Shinnosuke Takamichi, Sakriani Sakti, Satoshi Nakamura

Outline

This paper focuses on the communicative behavior of the marmoset. Marmosets are primates with diverse and complex vocalizations. Unlike human speech, their vocalizations are less structured and more variable, and they are recorded in noisy environments, making analysis difficult. To address these challenges, we pre-trained a Transformer model using Masked Autoencoders (MAE), a self-supervised learning method. Compared to CNNs, the MAE-pretrained Transformer outperformed marmosets in sound segmentation, classification, and speaker identification tasks. These results demonstrate the utility of self-supervised learning-based Transformer models in studying non-human communication in resource-poor environments.

Takeaways, Limitations

Takeaways:
A Novel Approach to Studying Nonhuman Communication in Low-Resource Environments (Pre-training Transformer Using MAE)
Demonstrating the effectiveness of a MAE-pretrained Transformer model that outperforms CNNs.
Presenting an effective methodology for analyzing marmoset sounds (segmentation, classification, and vocalist identification).
Limitations:
This model is specialized for marmoset data, and further research is needed to determine its generalizability to communication studies in other species.
Performance may be affected by the size and quality of the dataset used.
There is a possibility that the overfitting and instability problems of the Transformer model have not been completely resolved.
👍