Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents

Created by
  • Haebom

Author

Hang Su, Jun Luo, Chang Liu, Xiao Yang, Yichi Zhang, Yinpeng Dong, Jun Zhu

Outline

This paper addresses the emergence of autonomous AI agents and the resulting new security threats. The advancement of large-scale language models (LLMs) has led to the emergence of autonomous AI agents that perform perception, reasoning, and action in dynamic and open environments. These agents represent a paradigm shift from static reasoning systems to interactive and memory-enhanced entities, but they also introduce new security risks such as memory poisoning, tool misuse, reward hacking, and emergent mismatch that go beyond the threat models of existing systems or standalone LLMs. In this paper, we investigate the structural foundations and core features of long-term memory retention, modular tool use, recursive planning, and reflective reasoning that enhance the level of agent autonomy, and analyze security vulnerabilities across the agent stack, such as delayed decision risks, irreversible tool chains, and deceptive behavior due to internal state changes or value mismatches. In addition, we systematically review recent defense strategies deployed at various autonomy layers, such as input sanitization, memory lifecycle control, constrained decision making, structured tool invocation, and self-reflection, and introduce a reflective risk-aware agent architecture (R2A2) based on the constrained Markov decision process (CMDP). R2A2 integrates risk-aware world modeling, meta-policy adaptation, and joint reward-risk optimization to enable principled, proactive safety throughout the agent's decision-making loop.

Takeaways, Limitations

Takeaways:
We present a novel security threat model for autonomous AI agents and provide a systematic analysis of it.
We comprehensively review defensive strategies that operate at different levels of autonomy.
We propose a new agent architecture (R2A2) that takes risk awareness and safety into account.
Limitations:
There is a lack of experimental verification of the actual effectiveness and performance of the R2A2 architecture.
The types and scope of threats posed by emerging autonomous AI agents are so broad that it may not be possible to comprehensively address all risks.
There may be a lack of discussion about the difficulties of practical implementation and application of the proposed defense strategies.
👍