[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator

Created by
  • Haebom

Author

Xueyang Zhou, Weidong Wang, Lin Lu, Jiawen Shi, Guiyao Tie, Yongtian Xu, Lixing Chen, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

Outline

In this paper, we propose AutoSafe, a novel framework for improving the safety of agents based on large-scale language models (LLMs). AutoSafe systematically improves the safety of agents through fully automated synthetic data generation. Key features include an Open Extensible Threat Model (OTS) that accurately models safety hazards in various scenarios, and an automated data generation pipeline that simulates unsafe user behaviors and generates safe responses to build a large, diverse, and high-quality safety training dataset. Experimental results show that AutoSafe improves safety scores by an average of 45% on synthetic and real safety benchmarks, and achieves 28.91% improvement on real tasks, validating the generalization ability of learned safety strategies.

Takeaways, Limitations

Takeaways:
Presenting an effective framework (AutoSafe) for improving the safety of LLM-based agents
Eliminate the need to collect real-world risk data by building safety training datasets through fully automated synthetic data generation
Accurate modeling of safety risks in various scenarios through the Open Extensible Threat Model (OTS)
Experimental results verify the effectiveness of AutoSafe and the generalizability of safety strategies.
Presenting practical advances and scalability of building secure LLM-based agents for real-world deployments
Limitations:
Further validation is needed to ensure the OTS model is complete and covers all threat scenarios.
Differences between synthetic data and real-world data and the resulting potential for poor generalization performance
Further analysis is needed on the computing resource consumption and scalability limitations of the AutoSafe framework.
Lack of consideration of unpredictable risks that may arise during long-term operation.
👍