This paper explores how perception in conventional AI relies on state-based representation learning, while planning is achieved through exploration. Instead, we explore whether inference can emerge from representations that capture both perception and temporal structure. We show that standard temporal contrastive learning tends to rely on erroneous features and fail to capture temporal structure. To address this, we introduce Combinatorial Representations for Temporal Reasoning (CRTR), which uses a negative sampling method to remove spurious features and facilitate temporal inference. CRTR achieves robust results in domains with complex temporal structures, such as the Sokoban and Rubik's Cube. Specifically, for the Rubik's Cube, CRTR learns representations that generalize across all initial states and can solve the puzzle in fewer exploration steps than BestFS (while producing longer solutions). This represents the first method to efficiently solve arbitrary Cube states using only learned representations, without relying on external search algorithms.