Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

Created by
  • Haebom

Author

Zhenglin Wan, Xingrui Yu, David Mark Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Yew Soon Ong, Ivor Tsang

Outline

This paper presents Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates Quality Diversity (QD) optimization with Inverse Reinforcement Learning (IRL) to overcome the limitations of single-expert policy learning and learn diverse and robust behaviors. Specifically, we introduce Extrinsic Behavioral Curiosity (EBC), which provides additional curiosity rewards based on the novelty of a behavior compared to the existing behavior archive. Experiments on various robotic locomotion tasks demonstrate that EBC improves the performance of QD-IRL algorithms such as GAIL, VAIL, and DiffAIL by up to 185%, and outperforms expert performance by up to 20% in a humanoid environment. Furthermore, we demonstrate that EBC is applicable to gradient-arborescence-based QD reinforcement learning algorithms and is a general technique that significantly improves performance. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
We present QD-IRL and EBC, novel frameworks that overcome the limitations of single-expert policy learning and learn diverse and robust behaviors.
Experimentally verifying the exploration and performance improvement of various robot movement behaviors through EBC.
Presentation of a general technique applicable to various IRL and QD-RL algorithms.
Achieve results that surpass expert performance.
Ensuring reproducibility and extensibility through source code disclosure.
Limitations:
The effectiveness of EBC may depend on specific environments and algorithms.
Further research is needed on how to manage large-scale behavioral archives and how to make efficient comparisons.
Additional validation and safety assurance are needed for real-world applications.
Further research is needed on the optimization and generalization of EBC compensation design.
👍