This paper proposes FoMo, a foundational model applicable to various mobile network tasks, such as base station placement, resource allocation, and energy optimization. FoMo combines a diffusion model and a transformer to handle diverse prediction tasks, such as short-term and long-term predictions and distribution generation across multiple cities. It learns unique features of various tasks through various spatiotemporal masks and enhances transfer learning by identifying correlations between mobile traffic and urban environments through a contrastive learning strategy. Experimental results on nine real-world datasets demonstrate that FoMo outperforms existing models across various prediction tasks and zero- and few-shot learning, demonstrating strong generalizability.