Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CEHR-XGPT: A Scalable Multi-Task Foundation Model for Electronic Health Records

Created by
  • Haebom

Author

Chao Pang, Jiheum Park, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S. Kalluri, Shalmali Joshi, No emie Elhadad, Karthik Natarajan

Outline

CEHR-XGPT is a general-purpose foundation model for electronic health record (EHR) data, integrating three essential capabilities—feature representation, zero-shot prediction, and synthetic data generation—into a single architecture. To support temporal inference on clinical sequences, it incorporates a novel temporal token-based learning framework that explicitly encodes the patient's dynamic temporal course into the model structure. It demonstrates robust performance across all three tasks and generalizes effectively to external datasets through vocabulary expansion and fine-tuning. This versatility enables rapid model development, cohort discovery, and patient outcome prediction without task-specific retraining.

Takeaways, Limitations

Takeaways:
We present a general-purpose base model for EHR data, increasing its applicability to various tasks.
A novel temporal token-based learning framework for temporal reasoning is presented.
Increase model development efficiency by integrating zero-shot prediction and synthetic data generation capabilities.
Demonstrate generalizability to external datasets.
Limitations:
This paper does not specifically address Limitations. Further research is needed to address potential data bias, interpretability, and ethical issues that may arise when applied in real-world clinical settings.
👍