This study evaluated the ability of five recently released large-scale language models (LLMs) (OpenAI o1-preview, GPT-4o, LLaMA 3.1 (405B), Gemini 1.5 Pro, and Claude 3.5 Sonnet) to answer radiation oncology physics questions. The performance of the models was evaluated using 100 multiple-choice questions written by professional physicists, and the reasoning ability was evaluated by randomly arranging the correct answer options or replacing them with “None of the above answers is correct.” We also examined whether the reasoning ability was improved using the “Explain first” and “Step-by-step” prompts. As a result, all models showed expert-level performance, and o1-preview outperformed medical physicists in majority voting. However, when the correct answer option was replaced with “None of the above answers is correct,” the performance was significantly reduced, suggesting the need for improvement in reasoning ability. The “Explain first” and “Step-by-step” prompts contributed to the improvement of the reasoning ability of some models.