Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Created by
  • Haebom

Author

Yang Xu, Swetha Ganesh, Vaneet Aggarwal

Outline

This paper presents a non-asymptotic convergence analysis of Q-learning and actor-critic algorithms for robust mean-reward Markov decision processes (MDPs) under contamination, total-variation (TV) distance, and Wasserstein uncertainty sets. The key element of the analysis is to show that the optimal robust Q operator is strictly contractile for carefully designed quasi-norms (excluding constant functions). This property enables a probabilistic approximate update that learns the optimal robust Q-function using $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples. Furthermore, we provide an efficient routine for robust Q-function estimation, which facilitates robust critic estimation. Building on this, we present an actor-critic algorithm that learns $\epsilon$-optimal robust policies within $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples. Numerical simulations are provided to evaluate the performance of the algorithm.

Takeaways, Limitations

Takeaways:
We provide a theoretical foundation by providing non-asymptotic convergence analysis of Q-learning and actor-critic algorithms for robust mean-reward MDPs.
We prove the strict contractility of the optimal robust Q operator, providing a foundation for designing efficient learning algorithms.
We present an efficient robust Q-function and policy learning algorithm that achieves a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2})$.
Limitations:
Further experimental validation of the algorithm's performance in real-world applications is needed.
Further research is needed on the generalizability of the results to different types of uncertainty sets.
An analysis of the computational complexity of algorithms in high-dimensional state spaces is required.
👍