This study aims to improve the accuracy of diagnosing depression and post-traumatic stress disorder (PTSD) using large-scale language models (LLMs). We evaluated the performance of LLMs, including Gemini 1.5 Pro and GPT-4o mini, on the E-DAIC dataset using two modalities: text and audio. Specifically, we analyzed the impact of modality integration on diagnostic accuracy using new metrics: Modal Superiority Score and Disagreement Resolution Score. As a result, the Gemini 1.5 Pro model achieved an F1 score of 0.67 and a balanced accuracy of 77.4% for binary depression classification when combining text and audio modalities, demonstrating improved performance compared to using a single modality. This was achieved through zero-shot inference. Furthermore, we analyzed performance changes across various tasks (binary, severity, and multi-class classification) and prompt variations.