In this paper, we propose a multimodal framework, CXR-TFT, for predicting clinical outcomes in intensive care unit (ICU) patients. CXR-TFT predicts changes in CXR findings in critically ill patients by integrating temporally irregularly acquired chest X-ray images (CXRs), radiology reports, and high-frequency clinical data such as vital signs, laboratory results, and respiratory flow charts. The latent vectors extracted from the image encoder are combined with temporally consistent clinical data through temporal interpolation, and the CXR latent vectors are predicted hourly by a transformer model conditioned on previous latent vectors and clinical measurements. In a retrospective study of 20,000 ICU patients, CXR-TFT has been shown to predict abnormal CXR findings with high accuracy up to 12 hours before radiological presentation. This has significant potential for improving the management of time-sensitive diseases such as acute respiratory distress syndrome, where early intervention is crucial and diagnosis is often delayed.