Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning

Created by
  • Haebom

Author

Pascal R. van der Vaart, Neil Yorke-Smith, Matthijs T.J. Spaan

Outline

This paper focuses on uncertainty quantification in reinforcement learning, particularly in Bayesian deep Q-learning. Unlike previous studies that primarily focused on improving the accuracy of posterior distribution approximations, this paper investigates the accuracy of the prior distribution and likelihood assumptions that constitute the posterior distribution. The paper demonstrates the "cold posterior effect" in Bayesian deep Q-learning, whereby lowering the temperature of the posterior distribution improves performance, contrary to theory. To elucidate the cause of this phenomenon, we verify assumptions regarding the likelihood and prior distributions commonly used in Bayesian model-free algorithms, and experimentally demonstrate that the Gaussian likelihood assumption is frequently violated. Consequently, developing more appropriate likelihood and prior distributions is crucial for future Bayesian reinforcement learning research, and we propose a method to improve the prior distribution in deep Q-learning for better performance.

Takeaways, Limitations

Takeaways:
Identifying the "cold aftereffect" in Bayesian deep Q-learning and analyzing its causes.
Experimentally proves the problem of Gaussian likelihood assumption commonly used in existing Bayesian reinforcement learning algorithms.
We suggest the need to develop more suitable prior distributions and likelihoods, and propose a more performant Bayesian algorithm using improved prior distributions.
Limitations:
The proposed prior distribution improvement scheme may be limited to a specific problem or algorithm.
Experimental verification in more diverse and complex environments is needed.
The generalizability and theoretical analysis of the proposed improvement measures may be lacking.
👍