[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Benchmarking Foundation Models with Multimodal Public Electronic Health Records

Created by
  • Haebom

Author

Kunyu Yu, Rui Yang, Jingchi Liao, Siqi Li, Huitao Li, Irene Li, Yifan Peng, Rishikesan Kamaleswaran, Nan Liu

Outline

This study presents a comprehensive benchmark to evaluate the performance, fairness, and interpretability of the base models in both unimodal and multimodal learning using the publicly available MIMIC-IV database. To evaluate the ability of the base models to process electronic health records (EHRs) that provide flexibility in handling diverse medical data modalities, we develop a standardized data processing pipeline that reconciles heterogeneous clinical records into an analyzable format. We systematically compare eight base models, including unimodal and multimodal models, as well as domain-specific and general-purpose variants, to demonstrate that integrating multiple data modalities consistently improves predictive performance without introducing additional bias. The goal of this benchmark is to support the development of effective and reliable multimodal artificial intelligence (AI) systems for real-world clinical applications. The code is available at https://github.com/nliulab/MIMIC-Multimodal .

Takeaways, Limitations

Takeaways: Shows that multimodal learning improves the predictive performance of EHR base models without increasing bias. Standardized data processing pipelines and benchmarks increase the reproducibility and comparability of future studies. Contributes to the development of effective and reliable multimodal AI systems for real-world clinical applications.
Limitations: Generalizability may be limited due to the nature of the MIMIC-IV database. The types of models evaluated may be limited. Further in-depth evaluation of interpretability is needed.
👍