This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
This paper proposes a dynamic parameter memory (DPM) mechanism to address the limited processing capacity of SLLM due to its high frame rate. DPM incrementally encodes sentence-level emotional information into temporary LoRA modules, effectively "memorizing" contextual information, enabling unlimited-length audio processing even within a limited context window. Experimental results using the IEMOCAP dataset demonstrate that DPM significantly improves the emotion recognition performance of SLLM when processing long audio sequences, achieving state-of-the-art performance.
Takeaways, Limitations
•
Takeaways:
◦
Enables long-term speech data processing by solving the limited context window problem of SLLM.
◦
Effectively utilizing sentence-level emotional information to improve emotion recognition performance in conversations.
◦
Bringing the performance of SLLM-based speech emotion recognition to the state-of-the-art through the DPM mechanism.
•
Limitations:
◦
DPM's performance is based on experimental results on the IEMOCAP dataset, and further research is needed to determine its generalization performance on other datasets or diverse speech features.
◦
Currently, the focus is on sentence-level emotion encoding, and research on utilizing emotion information at more fine-grained units (e.g., syllables, words) may be needed.
◦
Additional analysis and optimization studies may be needed to address the increased computational cost of DPM.