While Retrieval Augmented Generation (RAG) using foundation models has demonstrated strong performance across a variety of tasks, expert-level reasoning, such as solving Olympiad-level physics problems, remains underexplored. Inspired by the way students prepare for competitions by reviewing past problems, we explore the potential of RAG with foundation models to enhance physics reasoning. We introduce PhoPile, a high-quality multimodal dataset specifically designed to systematically study retrieval-based reasoning. PhoPile captures the inherent multimodality of physics problem solving, including images, graphs, and equations. Using PhoPile, we benchmark RAG-augmented foundation models that incorporate both large-scale language models (LLMs) and large-scale multimodal models (LMMs) with multiple retrieval agents. Our results demonstrate that integrating physics corpora with retrieval can improve model performance, highlighting challenges that will stimulate further research on retrieval-augmented physics reasoning.