This paper addresses the challenges of developing a policy model capable of spatial understanding, 3D geometric reasoning, object relationships, and robot implementation in robotic manipulation. Existing 3D point cloud models lack semantic abstraction, and 2D image encoders struggle with spatial reasoning. To address these challenges, we propose the Spatial Enhanced Manipulation model (SEM), a novel diffusion-based policy framework that explicitly enhances spatial understanding from two complementary perspectives. The spatial enhancer augments visual representations with 3D geometric context, and the robot state encoder captures implementation-aware structure through graph-based modeling of joint dependencies. By integrating these modules, SEM significantly enhances spatial understanding, leading to robust and generalizable manipulation that outperforms existing baselines across a variety of tasks.