Goal-Directed Action Replication (GCBC) methods perform well on trained tasks, but sometimes fail to achieve zero-shot generalization on tasks requiring conditioning on new state-goal pairs, namely combinatorial generalization. This limitation can be attributed to the lack of temporal consistency in the state representations learned by BC. If temporally correlated states are appropriately encoded into similar latent representations, the out-of-distribution gap for new state-goal pairs can be reduced. In this paper, we formalize this concept by showing that encouraging long-term temporal consistency through subsequent representations (SRs) can promote generalization. We also propose $\text{BYOL-}\gamma$, a simple yet effective representation learning objective for GCBC. This objective theoretically approximates subsequent representations through self-predictive representations for finite MDPs, and achieves competitive empirical performance on a set of challenging combinatorial generalization tasks.