Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

Created by
  • Haebom

Author

Wenyun Li, Wenjie Huang, Chen Sun

Outline

This paper presents a proposed method to address the challenge of learning an effective reward function in real-world scenarios where reward signals are extremely rare. The proposed method performs reward formation by utilizing all transitions, including the zero-reward transition. Specifically, it combines semi-supervised learning (SSL) and a novel data augmentation technique to learn trajectory space representations from the zero-reward transition, thereby enhancing the efficiency of reward formation. Experimental results on Atari games and robot manipulation demonstrate that the proposed method outperforms supervised learning-based methods in reward inference and improves agent scores. In particular, in environments where rewards are even more scarce, the proposed method achieves a best-in-class score that is up to twice that of existing methods. The proposed double-entropy data augmentation technique significantly improves performance, achieving a best-in-class score that is 15.8% higher than that of other augmentation methods.

Takeaways, Limitations

Takeaways:
A novel method for effective reward formation in a scarce reward environment is presented.
Exploiting zero-reward transfer information using semi-supervised learning and data augmentation.
Demonstrated superior performance compared to existing methods in Atari game and robot manipulation experiments.
Validation of the effectiveness of double entropy data augmentation techniques.
Limitations:
Further experiments are needed to evaluate the generalization performance of the proposed method.
Applicability verification is needed for various types of scarce reward environments.
Research is needed on setting optimal parameters for data augmentation techniques.
👍