Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate

Created by
  • Haebom

Author

Liangwei Nathan Zheng, Wei Emma Zhang, Mingyu Guo, Miao Xu, Olaf Maennel, Weitong Chen

Outline

This paper proposes ConfSMoE, an improved version of the Sparse Mixture-of-Experts (SMoE) architecture, to effectively address the modal omission problem, a common problem in real-world multimodal learning. While conventional SMoE is vulnerable to modal omission, leading to performance degradation and generalization issues, ConfSMoE addresses the missing modalities through a two-stage imputation module. Through theoretical analysis and experimental evidence, we elucidate the phenomenon of expert collapse. To address this, we propose a novel expert gating mechanism that separates the existing softmax routing score into task confidence scores for ground truth signals. This mechanism mitigates the expert collapse problem without an additional load-balancing loss function. We comprehensively analyze the proposed method's resistance to modal omission and the impact of the proposed gating mechanism on four real-world datasets and three experimental setups.

Takeaways, Limitations

Takeaways:
Presenting an effective method for solving the modal omission problem in the SMoE architecture.
A theoretical analysis of the expert collapse phenomenon and a proposal for a solution.
Improved performance and generalization performance through a new gating mechanism.
Comprehensive performance analysis through various experimental settings
Limitations:
There is a possibility that the effectiveness of the proposed method may be limited to specific datasets.
Comparative analysis with other multimodal learning methods may be lacking.
Further research is needed to determine the generalizability of theoretical analyses of expert collapse.
👍