[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning

Created by
  • Haebom

Author

Elias Malomgr and Pieter Simoens

Outline

This paper highlights the need for reinforcement learning (RL) agents that learn from alternative supervision signals, such as non-rewarded interactions and unlabeled or incomplete demonstrations, rather than relying solely on explicit reward maximization. To develop general agents that can efficiently adapt to real-world environments, it is necessary to utilize these non-rewarded signals to guide learning and actions. In this paper, we propose a framework that transforms the similarity between the agent’s state and expert data into a well-formed intrinsic reward by applying a mapping function. This enables flexible and goal-oriented exploration of expert-like behaviors. We use a mixed autoencoder expert to capture diverse behaviors and accommodate missing information in demonstrations. Experimental results show that the proposed method enables robust exploration and performance in both sparse and dense reward environments, even when demonstrations are scarce or incomplete. This provides a practical framework for RL in realistic settings where optimal data is unavailable and precise reward control is required.

Takeaways, Limitations

Takeaways:
We present a novel reinforcement learning framework that effectively exploits incomplete or imperfect expert demonstrations.
Achieving robust exploration and performance in both sparse and dense reward environments.
Suggesting the applicability of RL in realistic settings where optimal data is not available and precise compensation control is required.
Leveraging mixed autoencoder experts to capture diverse behaviors and handle missing information.
Limitations:
Further studies are needed to investigate the generality of the proposed mapping function and its applicability to various tasks.
Further evaluation of scalability in high-dimensional state and action spaces is needed.
Comparative analysis with other intrinsic motivation techniques is needed.
👍