Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation

Created by
  • Haebom

Author

Qizhen Wu, Lei Chen, Kexin Liu, Jinhu Lu

Outline

This paper proposes a novel bidirectional approach that integrates discrete commands and continuous actions for efficient decision-making in adversarial situations, such as strategic confrontations, in swarm robotics. Existing task and motion planning methods decouple decision-making into two layers, but their unidirectional structure fails to capture inter-layer interdependencies, limiting adaptability in dynamic environments. The proposed bidirectional approach, based on hierarchical reinforcement learning, effectively maps commands to task assignments and actions to path planning, utilizing cross-training techniques to enhance learning across the hierarchical framework. Furthermore, it introduces a trajectory prediction model that links abstract task representations to feasible planning goals. Experimental results demonstrate that the proposed approach outperforms existing methods, achieving a match-winning rate of over 80% and a decision-making time of less than 0.01 seconds. Demonstration through large-scale experiments and real-world robot experiments further highlights the generalizability and practicality of the proposed approach.

Takeaways, Limitations

Takeaways:
Efficient and adaptive behavior in confrontational situations of swarm robots is possible through a bidirectional decision-making method based on hierarchical reinforcement learning.
Achieve a high match win rate of over 80% and a fast decision-making time of less than 0.01 seconds.
Verification of generalizability and practicality through large-scale simulations and real robot experiments.
More efficient task and motion planning through integration of discrete commands and continuous actions.
Limitations:
The performance of the proposed method may depend on the specific experimental environment. Further validation in various environments is required.
The accuracy of the trajectory prediction model can impact overall system performance. More sophisticated prediction models are needed.
The scale of real-world robotics experiments may be limited. Further verification of generalizability is needed through more extensive experiments.
👍