Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

STORI: A Benchmark and Taxonomy for Stochastic Environments

Created by
  • Haebom

Author

Aryan Amit Barsainyan, Jing Yu Lim, Dianbo Liu

Outline

Reinforcement learning (RL) techniques have achieved impressive performance on simulated benchmarks such as the Atari100k, but recent advances have largely remained confined to simulation, limiting their transfer to real-world environments. A key obstacle is environmental stochasticity. Real-world systems often feature noisy observations, unpredictable dynamics, and unusual conditions that compromise the stability of current methods. Benchmarks that capture this uncertainty are rare, favoring simplified settings where algorithms can be tuned for success. The lack of a well-defined taxonomy of stochasticity further complicates evaluation. To address this critical gap, we introduce STORI (STOchastic-ataRI), a benchmark that systematically incorporates diverse stochastic effects and enables rigorous evaluation of RL techniques under various forms of uncertainty. We propose a comprehensive taxonomy of five types of environmental stochasticity and demonstrate systematic vulnerabilities in state-of-the-art model-based RL algorithms through targeted evaluations of DreamerV3 and STORM. Our findings reveal that world models severely underestimate environmental variance, struggle with action corruption, and exhibit unreliable dynamics under partial observation. It provides a unified framework for developing more robust RL systems, and the code and benchmarks have been publicly released.

Takeaways, Limitations

Takeaways:
The introduction of the STORI benchmark, which systematically incorporates environmental probabilities, provides a framework for evaluating the robustness of reinforcement learning algorithms in real-world environments.
A comprehensive five-type taxonomy of environmental uncertainty is proposed.
We show that state-of-the-art model-based RL algorithms such as DreamerV3 and STORM are vulnerable to environmental uncertainty.
We find that the world model underestimates environmental variance, is vulnerable to action corruption, and exhibits unreliable dynamics under partial observation.
Limitations:
The STORM benchmark Limitations is not directly mentioned in the presented paper. (However, this may be due to limitations inherent in the benchmark itself or its development process, such as its confinement to a simulation environment or its applicability to specific algorithms.)
There may be a lack of extensive evaluation of algorithms other than DreamerV3 and STORM.
The research results may be limited to a specific algorithm and may not be generalizable to other algorithms.
👍