This paper presents a sim-to-real framework for robust autonomous navigation on unstructured terrain on remote planetary surfaces. Considering the dynamics of wheel interactions with complex particle media, we train a reinforcement learning agent through massively parallel simulations in a procedurally generated environment with diverse physical properties. The trained policy is then zero-shot transferred to a real wheeled rover in a lunar-like environment. We compare and analyze several reinforcement learning algorithms and action smoothing filters to identify the most effective combination for real-world deployment. We experimentally demonstrate that agents trained through procedural diversity outperform agents trained in static scenarios in zero-shot performance. We also analyze the tradeoffs of fine-tuning using high-fidelity particle physics, demonstrating that it offers a marginal benefit in improving low-speed accuracy but at a significant computational cost. This establishes a validated workflow for building robust learning-based navigation systems, a significant step toward the application of autonomous robots to space exploration.