Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

Created by
  • Haebom

Author

Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim

Outline

Existing offline hierarchical reinforcement learning methods rely on super-policy learning to generate sub-goal sequences, but their efficiency deteriorates as the task horizon increases, and they lack effective strategies to connect useful state transitions from different paths. In this paper, we propose Graph-Assisted Stitching (GAS), a novel framework that formulates sub-goal selection as a graph exploration problem rather than explicit super-policy learning. By embedding states in the temporal distance representation (TDR) space, GAS clusters semantically similar states from different paths into unified graph nodes, enabling efficient transition connections. Then, a shortest path algorithm is applied to select sub-goal sequences within the graph, and low-level policies are trained to reach the sub-goals. To improve the graph quality, we introduce a temporal efficiency (TE) metric that significantly improves task performance by filtering out noisy or inefficient transition states. GAS outperforms existing offline HRL methods on walking, navigation, and manipulation tasks. In particular, it achieves a score of 88.3 on the most connected task, dramatically outperforming the previous state-of-the-art score of 1.0. The source code is available at https://github.com/qortmdgh4141/GAS .

Takeaways, Limitations

Takeaways:
We propose a novel framework (GAS) that overcomes the limitations of high-level policy learning in offline hierarchical reinforcement learning and enables efficient generation of sub-goal sequences and state transition chaining.
Solving the performance degradation problem due to increasing task horizon using temporal distance representation (TDR) and graph search-based subgoal selection strategy.
Improve graph quality and task performance through temporal efficiency (TE) metrics.
Achieves superior performance over existing methods in a variety of tasks (walking, navigation, manipulation).
Limitations:
Further research is needed on the design and optimization of TDR space and TE metrics.
Generalization performance evaluation is needed for various working environments and complexities.
A solution is needed to address the problem of computational complexity as graph size increases.
👍