Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Policy Expansion for Bridging Offline-to-Online Reinforcement Learning

Created by
  • Haebom

Author

Haichao Zhang, We Xu, Haonan Yu

Outline

This paper presents a novel method for learning control policies by combining pre-training using offline data and online fine-tuning using reinforcement learning. To address the problem that useful behaviors of offline policies can be lost in the early stages of traditional online learning, we propose a technique that uses an offline-trained policy as a candidate policy in a policy set and expands the policy set by adding another policy for further learning. The two policies are adaptively configured to interact with the environment, and the offline policy is fully maintained during online learning. This allows the offline policy to naturally engage in exploration while preserving its useful behaviors, while also allowing the newly added policy to learn new useful behaviors. Experimental results on various tasks demonstrate the effectiveness of the proposed method.

Takeaways, Limitations

Takeaways:
We present a novel method that combines the advantages of offline pre-training and online fine-tuning to improve sample efficiency and performance.
Provides effective strategies for preserving useful offline policy behaviors in online learning processes.
Adaptive policy configuration enables natural exploration of offline policies and learning of new behaviors.
Proving its practicality by verifying its effectiveness in various tasks.
Limitations:
The performance improvements of the proposed method may be limited to specific tasks or environments.
Performance can vary depending on the size and configuration of your policy set, and finding optimal settings can be difficult.
Since the experimental results were presented only in a specific environment, it is necessary to evaluate the generalization performance in a wider range of environments.
👍