Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

Created by
  • Haebom

Author

Jeonghye Kim, Yongjae Shin, Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Youngchul Sung, Kanghoon Lee, Woohyung Lim

Outline

This paper proposes PARS, a novel algorithm that addresses the problem of Q-value extrapolation errors in reinforcement learning using offline data. Specifically, we address the problem of linear extrapolation beyond the data range and propose a method to gradually reduce Q-values outside the data range through reward scaling and hierarchical regularization (RS-LN) and a penalty mechanism for impossible actions (PA). Combining RS-LN and PA, PARS outperforms the existing state-of-the-art algorithms on various tasks in the D4RL benchmark, achieving particularly notable success on challenging tasks such as AntMaze Ultra.

Takeaways, Limitations

Takeaways:
A novel approach to solving the Q-value extrapolation error problem in offline reinforcement learning is presented.
Excellent performance of the PARS algorithm combining RS-LN and PA (based on the D4RL benchmark).
Effective performance improvements, especially on difficult tasks (AntMaze Ultra).
Performance improvements in both offline training and online fine-tuning.
Limitations:
Further research is needed on the generalization performance of the proposed algorithm.
More extensive experiments across a variety of environments and tasks are needed.
Further analysis is needed on hyperparameter tuning of RS-LN and PA.
Further comparative analysis with other offline reinforcement learning algorithms is needed.
👍