Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

Created by
  • Haebom

Author

Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, Ye Wu

Outline

This paper presents a novel approach to address security vulnerabilities in Large-Scale Language Model (LLM) agents, particularly the risk of prompt injection attacks, by treating agent execution traces as structured programs. We propose a program analysis framework, AgentArmor, which transforms agent traces into graph intermediate representations (CFGs, DFGs, PDGs, etc.) and enforces security policies through a type system. AgentArmor consists of three main components: a graph generator, a property registry, and a type system. By representing agent behavior as a structured program, it enables program analysis for sensitive data flows, trust boundaries, and policy violations. Evaluation results using the AgentDojo benchmark demonstrate that AgentArmor reduces ASR to 3% and limits utility degradation to 1%.

Takeaways, Limitations

Takeaways:
Providing an effective solution to the security vulnerability issue of LLM agents.
Enables the use of static analysis techniques by converting agent execution traces into structured programs for analysis.
AgentArmor can help reduce security threats caused by prompt injection attacks.
Experimental results verify the effectiveness and practicality of AgentArmor.
Limitations:
Further research is needed on the performance and effectiveness of AgentArmor.
Applicability verification is required for various types of LLM agents and environments.
Need to verify the accuracy and limitations of analysis of complex agent behavior
Need to evaluate adaptability to new types of attacks or agent designs
👍