Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics

Created by
  • Haebom

Author

Rushin H. Gindra, Giovanni Palla, Mathias Nguyen, Sophia J. Wagner, Manuel Tran, Fabian J Theis, Dieter Saur, Lorin Crawford, Tingying Peng

Outline

This paper presents HESCAPE, a large-scale benchmark for evaluating multimodal learning methods that leverage both tissue morphology images and gene expression data in spatial transcriptomics. Based on a curated whole-organ dataset comprising six gene panels and 54 donors, we systematically evaluate state-of-the-art image and gene expression encoders across various pre-training strategies and assess their effectiveness in two subsequent tasks: gene mutation classification and gene expression prediction. This study demonstrates that gene expression encoders are a key determinant of robust expression alignment, with gene models pre-trained with spatial transcriptomics data outperforming models trained without spatial data and simple baseline approaches. However, subsequent evaluations reveal a paradoxical result: while contrastive pre-training consistently improves gene mutation classification performance, it degrades direct gene expression prediction performance compared to baseline encoders trained without cross-modal objectives. Batch effects are identified as a key factor hindering effective cross-modal alignment, highlighting the importance of batch-robust multimodal learning approaches in spatial transcriptomics. Finally, we open source HESCAPE to provide a standardized dataset, evaluation protocol, and benchmarking tools.

Takeaways, Limitations

Takeaways:
We provide HESCAPE, a large-scale benchmark for evaluating the performance of multimodal learning methods using spatial transcriptomics data.
We reveal that gene expression encoders play a crucial role in multimodal expression alignment.
Pretraining with spatial transcriptomics data improves gene mutation classification performance.
The influence of batch effects on multimodal learning was investigated.
This suggests the need for robust multimodal learning methods for spatial transcriptomics studies.
Limitations:
Contrastive pre-training improves gene mutation classification performance but degrades direct gene expression prediction performance.
Batch effects have not been fully addressed. Further research is needed on multi-modal learning methods that are robust to batch effects.
👍