Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis

Created by
  • Haebom

Author

Miaosen Luo, Yuncheng Jiang, Sijie Mai

Outline

In this paper, we propose the KAN-MCP framework to solve the lack of interpretability and modal imbalance problems of multimodal sentiment analysis (MSA). KAN-MCP combines the interpretability of Kolmogorov-Arnold Networks (KAN) and the robustness of the Multimodal Clean Pareto (MCPareto) framework. KAN transparently analyzes inter-modal interactions through univariate function decomposition, and MCPareto addresses modal imbalance and noise interference using the Dimensionality Reduction and Denoising Modal Information Bottleneck (DRD-MIB) method. DRD-MIB reduces the feature dimensionality and removes noise to provide discriminative low-dimensional inputs to KAN, thereby reducing the modeling complexity and preserving sentiment-related information. MCPareto uses the output of DRD-MIB to dynamically adjust the inter-modal gradient contributions to ensure lossless transmission of auxiliary signals and effectively mitigate modal imbalance. As a result, KAN-MCP achieves excellent performance on benchmark datasets such as CMU-MOSI, CMU-MOSEI, and CH-SIMS v2, and provides an intuitive visualization interface through KAN's interpretable architecture.

Takeaways, Limitations

Takeaways:
Presenting an interpretable multimodal sentiment analysis model using KAN's univariate functional decomposition
Solving modal imbalance and noise issues with DRD-MIB
Excellent performance validation on CMU-MOSI, CMU-MOSEI, and CH-SIMS v2 datasets
Provides an intuitive visualization interface
Source code disclosure
Limitations:
Further research is needed to determine the generalizability of the presented methodology.
Need for additional performance evaluation on various types of multi-modal data
Validation of its utility in real-world applications is needed.
👍