This paper studies the transfer learning performance of speech-based self-supervised learning models (HuBERT, WavLM, and XEUS) for bioacoustic detection and classification tasks. We demonstrate their ability to generate rich latent representations of animal sounds from diverse taxa, and analyze model characteristics through linear probing of time-averaged representations. Furthermore, we extend the approach to consider the influence of temporal information by using different downstream architectures, and study the impact of frequency range and noise on performance. Consequently, we demonstrate competitive performance with fine-tuned bioacoustic pretraining models, demonstrating the impact of noise-tolerant pretraining settings. This highlights the potential of speech-based self-supervised learning as an effective framework for advancing bioacoustic research.