Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic

Created by
  • Haebom

Author

Qinxun Bai, Yuxuan Han, Wei Xu, Zhengyuan Zhou

Outline

This paper presents an effective method for improving sample efficiency in off-policy reinforcement learning using function approximation, focusing on the Actor-Critic (AC) framework. We aim to address the key challenges of off-policy AC methods: instability due to the "deadly triad," the problem of evaluating continuously changing policies, and the difficulty of accurately estimating off-policy policy gradients. To achieve this, we introduce a novel concept, functional critic modeling, and propose the first off-policy objective-based AC algorithm that demonstrates convergence in linear function settings. Furthermore, from a practical perspective, we present a carefully designed neural network architecture for functional critic modeling and demonstrate its effectiveness through preliminary experiments on widely used RL tasks from the DeepMind Control Benchmark.

Takeaways, Limitations

Takeaways:
We propose the first off-policy objective-based AC algorithm with guaranteed convergence.
Introducing functional critique modeling to address the "lethal triad" and "moving target" problems.
Presenting the applicability of practical neural network structures to real-world problems.
The effectiveness of the proposed method is proven through the DeepMind Control Benchmark.
Limitations:
Proof of theoretical convergence only in the linear function setting.
Further research is needed on the generalization performance of the proposed neural network structure.
Preliminary experiments on the DeepMind Control Benchmark alone are insufficient to determine effectiveness in a wide range of environments.
👍