This paper highlights that the low-level hearing abilities of large-scale audio language models (LALMs), particularly pitch and duration detection, remain underexplored. Low-level hearing is crucial for real-world, distributed tasks that require inferences about unknown sounds based on subtle acoustic cues. To address this gap, we present the World-of-Whale Bench (WoW-Bench), which assesses low-level auditory perception using the sounds of marine mammals. WoW-Bench consists of a cognitive benchmark that classifies novel sounds and a cognitive benchmark inspired by Bloom's taxonomy that assesses the ability to remember, understand, apply, and analyze sound events. The cognitive benchmark includes distractor questions to assess whether the model solves problems through listening or relies on other heuristics. Experimental results using state-of-the-art LALMs demonstrate significantly lower performance than humans, suggesting the need for a more robust auditory foundation for LALMs.