StefaLand is a generative spatiotemporal Earth-based model for predicting land surface responses and human feedback due to climate change. The model outperforms existing state-of-the-art models across four tasks and five datasets: streamflow, soil moisture, and soil composition. StefaLand generalizes well across diverse data-sparse regions and supports a wide range of land surface applications, utilizing a masked autoencoder backbone, a position-aware architecture, attribute-based representations, and a residual fine-tuning adapter.