This paper is the first to evaluate the ability of large-scale language models (LLMs) to solve operational research (OR) problems, specifically probabilistic modeling problems characterized by uncertainty using probability, statistics, and stochastic process tools. We assessed the problem-solving ability of LLMs by manually collecting graduate-level assignments and PhD exam questions, and investigated their real-world decision-making ability under uncertainty using SimOpt, an open-source simulation optimization library. Our results show that while state-of-the-art LLMs demonstrate human-expert-level proficiency in both classroom and real-world settings, significant additional work is needed to reliably automate the probabilistic modeling pipeline. This study highlights the potential for building AI agents that can support OR researchers and amplify the real-world impact of OR through automation.