Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

Created by
  • Haebom

Author

Shunfeng Zheng, Yudi Zhang, Meng Fang, Zihan Zhang, Zhitan Wu, Mykola Pechenizkiy, Ling Chen

Outline

While Retrieval Augmented Generation (RAG) using foundation models has demonstrated strong performance across a variety of tasks, expert-level reasoning, such as solving Olympiad-level physics problems, remains underexplored. Inspired by the way students prepare for competitions by reviewing past problems, we explore the potential of RAG with foundation models to enhance physics reasoning. We introduce PhoPile, a high-quality multimodal dataset specifically designed to systematically study retrieval-based reasoning. PhoPile captures the inherent multimodality of physics problem solving, including images, graphs, and equations. Using PhoPile, we benchmark RAG-augmented foundation models that incorporate both large-scale language models (LLMs) and large-scale multimodal models (LMMs) with multiple retrieval agents. Our results demonstrate that integrating physics corpora with retrieval can improve model performance, highlighting challenges that will stimulate further research on retrieval-augmented physics reasoning.

Takeaways, Limitations

Integrating physics corpora through RAG can improve model performance.
The PhoPile dataset enables systematic study of Olympiad-level physics problems.
Capturing the complexity of physics problem solving using multimodal data (pictures, graphs, equations).
We present benchmarking results demonstrating the improvement of physical reasoning through RAG.
Presenting challenges and areas for improvement for further research on search-augmented physics inference.
👍