This study utilized automatic speech recognition (ASR) technology for objective and scalable assessment of formal thought disorder (FTD), a core symptom of schizophrenia spectrum disorder. In order to overcome the limitations of existing clinical assessment scales, linguistic and temporal features of speech obtained through ASR, especially pause movements, were analyzed and utilized to predict FTD severity. Using three datasets (natural self-recorded diaries, structured picture descriptions, and dream stories), support vector regression (SVR) analysis was performed by combining pause-related features and existing semantic consistency measures. As a result, it was confirmed that pause features alone could strongly predict FTD severity, and the model that integrated pause features and semantic consistency measures showed better prediction performance than the model that only considered semantics (maximum correlation coefficient ρ = 0.649, AUC = 83.71%). These results suggest that a framework that combines temporal and semantic analysis can improve the assessment of disorganized language and contribute to the development of automatic speech analysis in psychosis.