Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Defeating Prompt Injections by Design

Created by
  • Haebom

Author

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tram er

Outline

In this paper, we propose CaMeL, a robust defense mechanism that protects against prompt injection attacks by creating a layer of protection around LLMs, which are increasingly deployed in agent systems interacting with untrusted environments. CaMeL explicitly extracts control and data flows from trusted queries, preventing untrusted data retrieved by LLMs from influencing program flow. It also uses the concept of features to enforce security policies when tools are invoked, preventing private data leakage through unauthorized data flows. We demonstrate the effectiveness of CaMeL by solving 77% of tasks with provably secure security on AgentDojo (84% for unprotected systems). The source code is available at https://github.com/google-research/camel-prompt-injection .

Takeaways, Limitations

Takeaways: We present an effective defense mechanism against prompt injection attacks of LLM agents processing untrusted data. CaMeL enhances LLM security through program flow control and data leakage prevention.
Limitations: The success rate of the CaMeL-based system (77%) is slightly lower than that of the unprotected system (84%). Since this is a performance evaluation result in an AgentDojo environment, further research is needed to determine generalizability to other environments. It does not guarantee complete defense against all types of prompt injection attacks.
👍