Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

Created by
  • Haebom

Author

Tien Pham, Xinyun Chi, Khang Nguyen, Manfred Huber, Angelo Cangelosi

Outline

This paper presents a novel framework, DeGuV, to address the problem of generalizing reinforcement learning (RL) agents' learned skills on visual inputs to new environments. DeGuV utilizes a learnable mask network to generate a mask from depth information that retains only important visual information and removes unnecessary pixels. This allows the agent to focus on key features, improving robustness under data augmentation. Furthermore, it incorporates contrastive learning and stabilizes Q-value estimation under augmentation, further improving sample efficiency and training stability. Evaluation on the RL-ViGen benchmark using the Franka Emika robot demonstrates that DeGuV outperforms state-of-the-art methods in both generalization and sample efficiency in zero-shot simulation-to-real transfer, while enhancing interpretability by highlighting the most relevant regions of the visual input.

Takeaways, Limitations

Takeaways:
We present DeGuV, a novel framework that simultaneously improves the generalization performance and sample efficiency of reinforcement learning agents.
Focusing on important visual information and improving robustness to data augmentation with a learnable mask network.
Improving sample efficiency and training stability through contrastive learning and stabilizing Q-value estimation.
Zero-shot simulation—achieves state-of-the-art performance in real-world transitions.
Improve interpretability by highlighting important areas in visual input.
Limitations:
Only the evaluation results for the RL-ViGen benchmark are presented, and further research is needed to determine generalization performance for other benchmarks or tasks.
Lack of detailed description of the design and optimization of learnable mask networks.
Additional experimental results are needed to determine applicability and scalability in real robotic environments.
👍