[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems

Created by
  • Haebom

Author

HM Sabbir Ahmad, Ehsan Sabouni, Alexander Wasilkoff, Param Budhraja, Zijian Guo, Songyuan Zhang, Chuchu Fan, Christos Cassandras, Wenchao Li

Outline

This paper addresses the problem of safe policy learning in safety-critical multi-agent autonomous systems. Each agent must always fulfill safety requirements while simultaneously cooperating with other agents to perform tasks. To this end, we propose a hierarchical multi-agent reinforcement learning (HMARL) approach based on control barrier functions (CBFs). The proposed hierarchical approach decomposes the overall reinforcement learning problem into joint cooperative action learning at the high-level and safe individual action learning at the low-level or agent level, conditioned on high-level policies. In particular, we propose a skill-based HMARL-CBF algorithm, where the high-level problem learns a common policy for the skills of all agents, and the low-level problem learns a policy to safely execute the skills using CBFs. We validate this approach in a challenging environmental scenario where many agents must safely navigate a conflicting road network. Compared to existing state-of-the-art methods, the proposed approach significantly improves safety, achieving a near-perfect (less than 5%) success/safety rate while improving performance in all environments.

Takeaways, Limitations

Takeaways: We present an effective HMARL-CBF algorithm that simultaneously improves safety and performance in safety-critical multi-agent systems. It achieves high success rate and safety rate even in complex environments.
Limitations: The performance of the proposed algorithm is based on the validation results for a specific environment (conflicting road network), and the generalizability to other types of environments or tasks requires further study. In addition, the complexity and computational cost of the CBF design may act as constraints for practical system applications. Additional analysis is needed on the possibility of performance degradation when scaling to a large-scale agent system.
👍