This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Benchmarking Foundation Models with Multimodal Public Electronic Health Records
Created by
Haebom
Author
Kunyu Yu, Rui Yang, Jingchi Liao, Siqi Li, Huitao Li, Irene Li, Yifan Peng, Rishikesan Kamaleswaran, Nan Liu
Outline
This study presents a comprehensive benchmark to evaluate the performance, fairness, and interpretability of the base models in both unimodal and multimodal learning using the publicly available MIMIC-IV database. To evaluate the ability of the base models to process electronic health records (EHRs) that provide flexibility in handling diverse medical data modalities, we develop a standardized data processing pipeline that reconciles heterogeneous clinical records into an analyzable format. We systematically compare eight base models, including unimodal and multimodal models, as well as domain-specific and general-purpose variants, to demonstrate that integrating multiple data modalities consistently improves predictive performance without introducing additional bias. The goal of this benchmark is to support the development of effective and reliable multimodal artificial intelligence (AI) systems for real-world clinical applications. The code is available at https://github.com/nliulab/MIMIC-Multimodal .
Takeaways: Shows that multimodal learning improves the predictive performance of EHR base models without increasing bias. Standardized data processing pipelines and benchmarks increase the reproducibility and comparability of future studies. Contributes to the development of effective and reliable multimodal AI systems for real-world clinical applications.
•
Limitations: Generalizability may be limited due to the nature of the MIMIC-IV database. The types of models evaluated may be limited. Further in-depth evaluation of interpretability is needed.