Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Record

Created by
  • Haebom

Author

Mosbah Aouad, Anirudh Choudhary, Awais Farooq, Steven Nevers, Lusine Demirkhanyan, Bhrandon Harris, Suguna Pappu, Christopher Gondi, Ravishankar Iyer

Outline

Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer, making early diagnosis challenging due to its characteristic symptoms and lack of reliable biomarkers. In this study, we propose a novel multimodal approach that integrates longitudinal diagnosis code histories from electronic health records with regularly collected laboratory measurements. This method combines neural controlled differential equations to model irregular laboratory time series, pretrained language models and recurrent neural networks to learn diagnosis code trajectory representations, and a cross-attention mechanism to capture the interactions between the two modalities. We developed and evaluated this approach on a real-world dataset of approximately 4,700 patients, demonstrating 6.5% to 15.5% improvement in AUC over state-of-the-art methods. Furthermore, we identified a panel of diagnosis codes and laboratory tests associated with increased PDAC risk, including both established and novel biomarkers. Codes are available at https://github.com/MosbahAouad/EarlyPDAC-MML .

Takeaways, Limitations

Takeaways:
Integrating multimodal data (diagnosis code history and laboratory measurements) from electronic health records improved the performance of early PDAC diagnosis.
We achieved performance that improved AUC by 6.5% to 15.5% over state-of-the-art methods.
Contributed to the identification of novel biomarkers associated with increased risk of PDAC.
We make the code of the developed model public to support reproducibility and further research.
Limitations:
There is a lack of explicit mention of the size and diversity of the dataset used in the study. Further validation of generalizability may be necessary.
Further elucidation of the biological mechanisms underlying the association between specific diagnostic codes and increased risk in laboratory panels may be necessary.
Further research is needed to determine generalizability to other types of cancer or diseases.
👍