This paper proposes a novel framework, Physiology-aware and Task-invariant Spatio-temporal Modeling (PTSM), to address the cross-subject electroencephalogram (EEG) decoding challenge, a critical challenge in brain-computer interface (BCI) research. PTSM utilizes a dual-branch masking mechanism that simultaneously learns individual-specific and task-related common features to enable interpretable and robust cross-subject EEG decoding. By decomposing masks in both temporal and spatial dimensions, PTSM fine-tunes dynamic EEG patterns while reducing computational overhead. Furthermore, it applies information-theoretic constraints to decompose latent embeddings into task- and subject-specific subspaces, addressing the representational entanglement problem. A multi-objective loss function is used to integrate classification, contrast, and separation objectives for model optimization. Extensive experiments on a cross-subject motion image dataset demonstrate that PTSM achieves robust zero-shot generalization performance, outperforming state-of-the-art baseline models. These results highlight the effectiveness of decoupled neural representations in achieving personalized and transferable decoding in abnormal neurophysiological environments.