Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning

Created by
  • Haebom

Author

James McCarthy, Radu Marinescu, Elizabeth Daly, Ivana Dusparic

Outline

This paper proposes the Optimistic Risk-Averse Actor-Critic (ORAC) algorithm to address the problem of conservative exploration in risk-averse constrained reinforcement learning (RaCRL), which leads to suboptimal policy convergence. ORAC constructs an exploration policy that maximizes the upper confidence interval of the state-action reward-value function and minimizes the lower confidence interval of the risk-averse state-action cost-value function. It encourages exploration of uncertain regions to discover high-reward states while satisfying safety constraints, and demonstrates improved reward-cost trade-offs compared to existing methods in continuous control tasks such as Safety-Gymnasium and CityLearn.

Takeaways, Limitations

Takeaways:
We present a novel search-based approach that effectively addresses the problem of convergence to suboptimal policies in risk-averse constrained reinforcement learning.
Policy learning is possible to effectively explore uncertain environmental regions and satisfy safety constraints while maximizing rewards.
Experimentally demonstrated performance improvements in various continuous control tasks such as Safety-Gymnasium and CityLearn.
Provides an efficient trade-off between reward and risk.
Limitations:
The performance of the proposed algorithm may depend on specific environments. Further research is needed to determine its generalization performance across various environments.
Accurate estimation of the upper and lower confidence intervals can significantly impact algorithm performance. Improvements in confidence interval estimation methods are needed.
Computational costs can be high in complex environments. Research is needed to improve computational efficiency.
👍