[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PromptArmor: Simple yet Effective Prompt Injection Defenses

Created by
  • Haebom

Author

Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Aloair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, Dawn Song

Outline

In this paper, we point out that large-scale language model (LLM) agents are vulnerable to prompt injection attacks, and present PromptArmor, a simple yet effective defense against them. PromptArmor leverages existing LLMs to detect and remove malicious prompts from the input, ensuring that the agent performs the intended user action. On AgentDojo benchmarks using GPT-4, GPT-4.1, or o4-mini, PromptArmor achieves less than 1% false positive and false negative rates, and the attack success rate is reduced to less than 1% after removing malicious prompts with PromptArmor. We also demonstrate its effectiveness against adaptive attacks and explore various LLM prompting strategies. We propose that PromptArmor be adopted as a standard criterion for evaluating prompt injection attack defenses.

Takeaways, Limitations

Takeaways:
Presenting an effective defense technique against prompt injection attack vulnerability of LLM agent
Experimentally verified high accuracy and low error rate of PromptArmor
Demonstrating effectiveness against adaptive attacks
Presenting a standard criterion for evaluating new defense techniques
Limitations:
PromptArmor's performance may depend on the LLM model used.
Generalization performance against new types of prompt injection attacks requires further study.
Further validation of PromptArmor's effectiveness and reliability in real-world environments is needed.
👍