Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption

Created by
  • Haebom

Author

Longxiang He, Deheng Ye, Junbo Tan, Xueqian Wang, Li Shen

Outline

Offline-to-Online Reinforcement Learning (O2O RL), which pre-trains policies based on offline data and fine-tunes them through online interactions, is a promising paradigm for real-world applications. However, offline data and online interactions in real-world environments are often noisy or maliciously corrupted, which can degrade the performance of O2O RL. This study proposes Robust Policy Expansion (RPEX), a novel method that mitigates heavy-tailedness by incorporating Inverse Probability Weighting (IPW) into online search policies. Extensive experimental results using the D4RL dataset demonstrate that RPEX achieves state-of-the-art O2O performance under various data corruption scenarios.

Takeaways, Limitations

Takeaways:
A novel methodology is presented to improve the performance of O2O RL in data corruption environments.
Solving the heavy-tailed behavior problem of policies using IPW.
RPEX is a simple yet effective way to achieve SOTA performance.
Limitations:
Further analysis is needed to determine performance changes based on specific data corruption types and severity.
Verification of the generalizability and stability of RPEX in real environments is necessary.
Detailed research on hyperparameter tuning of RPEX is needed.
👍