Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations

Created by
  • Haebom

Author

Jaeyeon Kim, Heeseung Yun, Sang Hoon Woo, Chao-Han Huck Yang, Gunhee Kim

Outline

This paper highlights that the low-level hearing abilities of large-scale audio language models (LALMs), particularly pitch and duration detection, remain underexplored. Low-level hearing is crucial for real-world, distributed tasks that require inferences about unknown sounds based on subtle acoustic cues. To address this gap, we present the World-of-Whale Bench (WoW-Bench), which assesses low-level auditory perception using the sounds of marine mammals. WoW-Bench consists of a cognitive benchmark that classifies novel sounds and a cognitive benchmark inspired by Bloom's taxonomy that assesses the ability to remember, understand, apply, and analyze sound events. The cognitive benchmark includes distractor questions to assess whether the model solves problems through listening or relies on other heuristics. Experimental results using state-of-the-art LALMs demonstrate significantly lower performance than humans, suggesting the need for a more robust auditory foundation for LALMs.

Takeaways, Limitations

Takeaways: WoW-Bench provides a new benchmark for assessing the low-level auditory perception abilities of LALM. It clearly demonstrates the current shortcomings of LALM's low-level auditory abilities and suggests directions for future research. Designing a cognitive benchmark using Bloom's taxonomy offers a useful approach for multifacetedly assessing the model's auditory comprehension. Evaluation using distraction questions allows for a more accurate assessment of the model's actual listening abilities.
Limitations: Because WoW-Bench focuses solely on marine mammal sounds, it is limited in evaluating the low-level hearing ability of LALMs for other types of sounds. Further research is needed to determine the generalizability of the benchmark. Current experiments are limited to state-of-the-art LALMs, and further experiments with various model architectures and training methods are needed.
👍