This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
MVCL-DAF++: Enhancing Multimodal Intent Recognition via Prototype-Aware Contrastive Alignment and Coarse-to-Fine Dynamic Attention Fusion
Created by
Haebom
Author
Haofeng Huang, Yifei Han, Long Zhang, Bin Li, Yangfan He
Outline
MVCL-DAF++ is a proposed model to address the weak semantic foundation of multimodal intent recognition (MMIR) and its low robustness under noisy or rare class conditions. It improves upon the existing MVCL-DAF by adding two major modules: first, prototype-aware contrastive alignment aligns instances to class-level prototypes to enhance semantic consistency; second, coarse-to-fine attention fusion integrates global modal summaries with token-level features to perform hierarchical cross-modal interactions. On the MIntRec and MIntRec2.0 datasets, MVCL-DAF++ achieves state-of-the-art performance, achieving +1.05% and +4.18% WF1 improvements in rare class recognition, respectively. This demonstrates the effectiveness of prototype-based learning and coarse-to-fine fusion for robust multimodal understanding. The source code is available at https://github.com/chr1s623/MVCL-DAF-PlusPlus .
We demonstrate that prototype-based learning and coarse-fine attention fusion are effective in improving the performance of multimodal intent recognition.
◦
Significantly improved rare class recognition performance in particular.
◦
A new state-of-the-art model for multimodal understanding is presented.
◦
Reproducibility is possible through open source code.
•
Limitations:
◦
Further experiments are needed to evaluate the generalization performance of the proposed model.
◦
Lack of performance evaluation on other multi-modal datasets.
◦
Analysis of the model's complexity and computational cost is required.