SToFM is a novel foundation model for analyzing multi-scale spatial transcriptome data. It is proposed to solve the difficulty of extracting multi-scale information (macroscopic tissue morphology, microscopic cellular microenvironment, and gene-level gene expression profile) from massive and complex spatial transcriptome data. SToFM performs multi-scale information extraction for each ST slice to construct a set of ST sub-slices that aggregate macroscopic, microscopic, and gene-scale information. Then, it uses SE(2) Transformer to obtain high-quality cell representations from the sub-slices. In addition, we build SToCorpus-88M , the largest high-resolution spatial transcriptome corpus for pre-training . It achieves excellent performance on various sub-tasks such as tissue region semantic segmentation and cell type annotation, demonstrating a comprehensive understanding of ST data through the capture and integration of multi-scale information.