Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions

Created by
  • Haebom

Author

Liangjian Wen, Qun Dai, Jianzhuang Liu, Jiangtao Zheng, Yong Dai, Dongkai Wang, Zhao Kang, Jun Wang, Zenglin Xu, Jiang Duan

InfMasking: Contrastive Synergistic Information Extraction for Multimodal Representation Learning

Outline

This paper proposes InfMasking, a novel approach for effectively capturing synergies between modalities in multimodal representation learning. InfMasking uses an infinite masking strategy to randomly mask most features of each modality, retaining only partial information to generate representations with diverse synergy patterns. The unmasked fused representation is aligned with the masked representation through mutual information maximization, encoding comprehensive synergy information. This method exposes the model to various combinations of partial modalities during training, enabling rich interactions to be captured. To address computational complexity, we derive the InfMasking loss to approximate mutual information estimation. Experiments on large-scale real-world datasets demonstrate that InfMasking achieves state-of-the-art performance across seven benchmarks.

Takeaways, Limitations

Takeaways:
We emphasize the importance of synergistic information in multimodal representation learning and propose a new methodology to effectively capture it.
Enables learning of various synergy patterns through infinite masking strategies.
Development of InfMasking loss considering computational efficiency.
Achieving state-of-the-art performance on diverse real-world datasets.
Limitations:
Further research is needed on the theoretical background and specific numerical validity analysis of the infinite masking strategy.
Further research is needed on the accuracy and stability of approximate calculations of the InfMasking loss.
Scalability review is required for various modality combinations and complex interaction patterns.
👍