This paper presents an innovative framework for early detection of breast cancer. By integrating Vision Transformer (ViT) and Graph Neural Network (GNN), we improve the breast cancer detection accuracy up to 84.2% using the CBIS-DDSM dataset. ViT models global image features, and GNN models structural relationships, achieving better performance than existing methods, and supporting physicians' clinical judgment through interpretable attention heatmaps.