Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning

Created by
  • Haebom

Author

Giuseppe Canonaco, Leo Ardon, Alberto Pozanco, Daniel Borrajo

Outline

This paper explores the use of Potential-Based Reward Shaping (PBRS) to address the sample inefficiency problem in Reinforcement Learning (RL). We highlight the difficulty of selecting an appropriate latent function and the bias inherent in using a finite horizon due to computational limitations. We then provide a theoretical rationale for why selecting an optimal value function as the latent function improves performance. We analyze the bias induced by a finite horizon in PBRS and, by leveraging abstraction to approximate the optimal value function, evaluate the sample efficiency and performance impact of PBRS in four environments, including a goal-directed navigation task and three Arcade Learning Environment (ALE) games. Experimental results demonstrate that a simple fully-connected network can achieve performance comparable to that of a CNN-based solution.

Takeaways, Limitations

Takeaways: We present the theoretical basis for PBRS, which uses the optimal value function as a latent function, and experimentally validate its potential for improved sample efficiency and performance. We demonstrate the potential for achieving performance comparable to CNN-based solutions with a simple network.
Limitations: Further research is needed to determine the generalization performance of the proposed method. Further experimental validation is required for diverse environments and complex tasks. A complete solution to the bias problem caused by finite horizons is not presented.
👍