Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Diffusion-based Multi-modal Synergy Interest Network for Click-through Rate Prediction

Created by
  • Haebom

Author

Xiaoxi Cui, Weihai Lu, Yu Tong, Yiheng Li, Zhejun Zhao

Outline

This paper points out the limitations of existing click-through rate (CTR) prediction methods, which are primarily based on ID modality and thus fail to comprehensively model diverse user preferences. We propose a novel framework for multimodal CTR prediction, Diffusion-based Multi-modal Synergy Interest Network (Diff-MSIN). Diff-MSIN consists of three modules: the Multi-modal Feature Enhancement (MFE) Module, the Synergistic Relationship Capture (SRC) Module, and the Feature Dynamic Adaptive Fusion (FDAF) Module. Each module focuses on extracting synergies, commonalities, and distinctiveness among various modalities, capturing user preferences, and reducing fusion noise. Experimental results using Rec-Tmall and three Amazon datasets show that Diff-MSIN outperforms existing methods by at least 1.67%.

Takeaways, Limitations

Takeaways:
We propose a novel CTR prediction framework that utilizes multimodal information to more accurately model users' diverse preferences.
We propose a new method to overcome the limitations of existing multimodal fusion methods and effectively model the synergistic effects between modalities.
We experimentally demonstrate that the proposed method outperforms existing methods on various datasets.
Reproducibility was ensured through public code.
Limitations:
The performance improvement of the proposed method may be limited to specific datasets.
Additional experiments using more diverse and larger datasets are needed.
There is a lack of analysis on the computational complexity and efficiency of the proposed framework.
👍