Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

Created by
  • Haebom

Author

Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim

Outline

Existing offline hierarchical reinforcement learning methods rely on super-policy learning to generate sub-goal sequences, but their efficiency deteriorates as the task horizon increases and they lack effective strategies to connect useful state transitions across different trajectories. In this paper, we propose Graph-Assisted Stitching (GAS), a novel framework that formulates sub-goal selection as a graph exploration problem instead of explicit super-policy learning. By embedding states in the temporal distance representation (TDR) space, GAS clusters semantically similar states across different trajectories into unified graph nodes, enabling efficient transition linking. It then applies a shortest path algorithm to select sub-goal sequences within the graph, and low-level policies learn to reach the sub-goals. To improve the graph quality, we introduce a temporal efficiency (TE) metric that significantly improves task performance by filtering out noisy or inefficient transition states. GAS outperforms previous offline HRL methods on walking, navigation, and manipulation tasks. In particular, it achieves a score of 88.3 on the most connected task, significantly outperforming the previous best score of 1.0. The source code can be found at https://github.com/qortmdgh4141/GAS .

Takeaways, Limitations

Takeaways:
We present a novel framework for efficiently selecting sub-target sequences via graph exploration without relying on high-level policy learning.
Efficiently linking useful state transitions across different trajectories using temporal distance representation (TDR).
Improve graph quality and improve task performance through temporal efficiency (TE) metric.
Achieve performance that outperforms existing methods across a variety of tasks (walking, navigation, manipulation).
Limitations:
Further research is needed on efficient embedding methods in the TDR space and optimization of the TE metric.
Potential increase in computational complexity as graph size increases.
Hyperparameter settings optimized for specific tasks are required.
👍