Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models

Created by
  • Haebom

Author

Chih-Kai Yang, Neo Ho, Yi-Jyun Lee, Hung-yi Lee

Outline

This paper analyzes the internal mechanisms of large-scale audio-language models (LALMs) to gain a deeper understanding of auditory attribute recognition. We apply a lexical projection technique to three state-of-the-art LALMs to track changes in attribute information across layers and token positions. We find that attribute information decreases with increasing layer depth when attribute recognition fails, and that resolving attributes in early layers is correlated with improved accuracy. Furthermore, we reveal that LALMs rely heavily on querying auditory input rather than aggregating necessary information from hidden states at attribute mention locations. Based on these findings, we propose methods to improve the performance of LALMs and suggest directions for future improvements.

Takeaways, Limitations

Takeaways:
Provides a deeper understanding of auditory property processing in LALMs.
A new method for improving LALMs performance is presented.
Emphasize the importance of attribute resolution in early layers
Providing insight into the auditory input dependence of LALMs
Limitations:
The types of LALM used in the analysis may be limited.
Possibility of incomplete analysis due to limitations of lexical projection techniques
Further research is needed to determine the generality and scalability of the proposed performance-enhancing method.
👍