[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

How Not to Detect Prompt Injections with an LLM

Created by
  • Haebom

Author

Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, Somesh Jha

Outline

This paper studies prompt injection attacks on large-scale language model (LLM)-based applications and agents. In particular, we reveal the structural vulnerability of Known-Answer Detection (KAD), a conventional prompt injection defense technique, and propose DataFlip, a novel attack technique that exploits it. DataFlip effectively evades KAD defense techniques (detection rate below 1.5%) and induces malicious behavior with a high success rate (up to 88%) without white-box access or optimization procedures for LLM.

Takeaways, Limitations

Takeaways: By revealing the fundamental vulnerability of KAD-based prompt injection attack defense techniques, we question the reliability of existing defense techniques and suggest the need for the development of more powerful defense techniques. The DataFlip attack technique empirically demonstrates the security vulnerability of LLM-based systems.
Limitations: Since this study presented an attack technique against a specific KAD defense technique, its effectiveness against other types of defense techniques requires additional research. In addition, the success rate of DataFlip may vary depending on the specific environment, and its effectiveness in real environments should be verified through additional experiments.
👍