This paper focuses on Semantic Occupancy Prediction (SOP), which infers occupancy and semantic information in unobserved areas to address the incompleteness of sensor data (LiDAR and camera) in autonomous driving. To address the lack of spatial structure modeling in existing Transformer-based SOP methods, we propose Spatially Aware Windowed Attention (SWA), a novel mechanism that integrates local spatial context into attention. SWA achieves state-of-the-art performance on LiDAR-based SOP benchmarks and demonstrates its applicability to camera-based SOP as well.