This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
MMFformer: Multimodal Fusion Transformer Network for Depression Detection
Created by
Haebom
Author
Md Rezwanul Haque, Md. Milon Islam, SM Taslim Uddin Raju, Hamdi Altaheri, Lobna Nassar, Fakhri Karray
Outline
This paper presents a novel multimodal network (MMFformer) for early diagnosis of depression by leveraging diverse information from social media. MMFformer utilizes a Transformer network to capture spatial features of video and a Transformer encoder to analyze the temporal dynamics of audio. It fuses features from multiple modalities using late-stage and mid-stage fusion strategies to analyze cross-correlations and extract spatiotemporal patterns related to depression. It outperforms existing state-of-the-art methods on two large-scale depression detection datasets (D-Vlog and LMVD), improving F1-score by 13.92% on the D-Vlog dataset and 7.74% on the LMVD dataset. The source code is publicly available.
Takeaways, Limitations
•
Takeaways:
◦
Leveraging social media data suggests the potential to improve the accuracy of early depression diagnosis.
◦
Demonstrating the Effectiveness of Depression Pattern Analysis through Multimodal Information Fusion