This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
InfMasking: Contrastive Synergistic Information Extraction for Multimodal Representation Learning
Outline
This paper proposes InfMasking, a novel approach for effectively capturing synergies between modalities in multimodal representation learning. InfMasking uses an infinite masking strategy to randomly mask most features of each modality, retaining only partial information to generate representations with diverse synergy patterns. The unmasked fused representation is aligned with the masked representation through mutual information maximization, encoding comprehensive synergy information. This method exposes the model to various combinations of partial modalities during training, enabling rich interactions to be captured. To address computational complexity, we derive the InfMasking loss to approximate mutual information estimation. Experiments on large-scale real-world datasets demonstrate that InfMasking achieves state-of-the-art performance across seven benchmarks.
Takeaways, Limitations
•
Takeaways:
◦
We emphasize the importance of synergistic information in multimodal representation learning and propose a new methodology to effectively capture it.
◦
Enables learning of various synergy patterns through infinite masking strategies.
◦
Development of InfMasking loss considering computational efficiency.
◦
Achieving state-of-the-art performance on diverse real-world datasets.
•
Limitations:
◦
Further research is needed on the theoretical background and specific numerical validity analysis of the infinite masking strategy.
◦
Further research is needed on the accuracy and stability of approximate calculations of the InfMasking loss.
◦
Scalability review is required for various modality combinations and complex interaction patterns.