This paper introduces MORPH, a shape-invariant autoregressive foundation model for partial differential equations (PDEs). MORPH is based on a convolutional vision transformer backbone capable of handling multiple fields with varying data dimensions (1D-3D), resolutions, and mixed scalar and vector components. This architecture combines (i) component-wise convolution, which jointly processes scalar and vector channels to capture local interactions; (ii) cross-field attention, which models and selectively propagates information between different physical fields; and (iii) axial attention, which factors global spatiotemporal self-attention along individual spatial and temporal axes to reduce computational burden while maintaining expressiveness. We pretrain multiple model variants on diverse heterogeneous PDE datasets and evaluate their transfer to various downstream prediction tasks. MORPH outperforms models trained from scratch in both zero-shot and full-shot generalization using global model fine-tuning and a parameter-efficient low-rank adapter (LoRA). In extensive evaluations, MORPH matches or surpasses strong baselines and state-of-the-art models.