This paper addresses the challenge of exploration in reinforcement learning (RL), especially in environments with sparse or adversarial reward structures. We study how the structure of a deep neural network policy implicitly influences exploration before training. Using a simple model, we demonstrate theoretically and experimentally a strategy for generating ballistic or diffuse trajectories from untrained policies. Using infinite-width network theory and continuous-time limits, we show that untrained policies return correlated actions and generate important state visit distributions. We discuss the distribution of corresponding trajectories for standard architectures, and provide insight into the inductive bias for solving exploration problems in early training. As a result, we establish a theoretical and experimental framework that uses policy initialization as a design tool for understanding exploration behavior.