CaSTFormer is a driving intention prediction model for improving the safety and interaction efficiency of human-machine cooperative driving systems. It is proposed to overcome the limitations of existing models in accurately modeling the complex spatiotemporal interdependencies and the unpredictable variability of human driving behavior. CaSTFormer introduces the reciprocal back-propagation fusion (RSF) mechanism, the causal pattern extraction (CPE) module, and the feature synthesis network (FSN) to explicitly model the causal relationship between driver behavior and environmental context, thereby performing accurate temporal alignment, false correlation removal, and consistent representation synthesis for spatiotemporal inference. It achieves state-of-the-art performance on the Brain4Cars dataset and effectively captures complex causal spatiotemporal dependencies, thereby improving the accuracy and transparency of driving intention prediction.