Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Sketch Decompositions for Classical Planning via Deep Reinforcement Learning

Created by
  • Haebom

Author

Michael Aichmüller , Hector Geffner

Outline

This paper addresses the importance of identifying common subgoal structures in planning and reinforcement learning for long-term goal achievement. In the traditional planning domain, we leverage subgoal structures, which can be expressed as feature-based rules (sketches), to decompose the problem into subproblems and demonstrate that the problem can be solved in polynomial time using a greedy sequence of IW(k) search. Existing sketch learning methods using feature pools and min-SAT solvers suffer from limitations in scalability and expressiveness. To address these limitations, we propose a deep reinforcement learning (DRL) method that finds general policies in a modified planning problem. In this modified planning problem, the successor states of state s are defined as states reachable from s through IW(k) search. Experimental evaluations across various domains demonstrate that while the proposed DRL method does not produce interpretable rule-based sketches, the resulting decomposition is clearly understandable.

Takeaways, Limitations

Takeaways: We propose a novel method for learning sketch decompositions of planning problems using deep reinforcement learning, suggesting the potential to overcome the scalability and expressiveness limitations of existing methods. We demonstrate efficient problem solving through a greedy strategy based on IW(k) search. Although not in an interpretable rule form, we demonstrate that a clearly understandable decomposition can be obtained.
Limitations: The proposed DRL method does not generate interpretable rule-form sketches. Further research is needed to determine the generalization performance of the learned decomposition. Extensive experimental validation across various planning domains is needed.
👍