Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Leveraging Audio and Text Modalities in Mental Health: A Study of LLMs Performance

Created by
  • Haebom

Author

Abdelrahman A. Ali, Aya E. Fouda, Radwa J. Hanafy, Mohammed E. Fouda

Outline

This study aims to improve the accuracy of diagnosing depression and post-traumatic stress disorder (PTSD) using large-scale language models (LLMs). We evaluated the performance of LLMs, including Gemini 1.5 Pro and GPT-4o mini, on the E-DAIC dataset using two modalities: text and audio. Specifically, we analyzed the impact of modality integration on diagnostic accuracy using new metrics: Modal Superiority Score and Disagreement Resolution Score. As a result, the Gemini 1.5 Pro model achieved an F1 score of 0.67 and a balanced accuracy of 77.4% for binary depression classification when combining text and audio modalities, demonstrating improved performance compared to using a single modality. This was achieved through zero-shot inference. Furthermore, we analyzed performance changes across various tasks (binary, severity, and multi-class classification) and prompt variations.

Takeaways, Limitations

Takeaways:
Presenting the possibility of multimodal mental health diagnosis using LLM.
Identifying the potential for improved diagnostic accuracy through integration of text and audio modalities.
Demonstrating the robustness of the model through zero-shot inference.
Excellent performance of Gemini 1.5 Pro and GPT-4o mini models confirmed.
Limitations:
Limitations on generalizability using only one E-DAIC dataset.
Additional validation is required as the indicators used (Modal Superiority Score, Disagreement Resolution Score) are new indicators.
Further research is needed to determine its applicability in real clinical settings.
Possible bias towards a specific model.
👍