Reinforcement learning (RL) techniques have achieved impressive performance on simulated benchmarks such as the Atari100k, but recent advances have largely remained confined to simulation, limiting their transfer to real-world environments. A key obstacle is environmental stochasticity. Real-world systems often feature noisy observations, unpredictable dynamics, and unusual conditions that compromise the stability of current methods. Benchmarks that capture this uncertainty are rare, favoring simplified settings where algorithms can be tuned for success. The lack of a well-defined taxonomy of stochasticity further complicates evaluation. To address this critical gap, we introduce STORI (STOchastic-ataRI), a benchmark that systematically incorporates diverse stochastic effects and enables rigorous evaluation of RL techniques under various forms of uncertainty. We propose a comprehensive taxonomy of five types of environmental stochasticity and demonstrate systematic vulnerabilities in state-of-the-art model-based RL algorithms through targeted evaluations of DreamerV3 and STORM. Our findings reveal that world models severely underestimate environmental variance, struggle with action corruption, and exhibit unreliable dynamics under partial observation. It provides a unified framework for developing more robust RL systems, and the code and benchmarks have been publicly released.