Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PromptKeeper: Safeguarding System Prompts for LLMs

Created by
  • Haebom

Author

Zhifeng Jiang, Zhihua Jin, Guoliang He

Outline

This paper proposes a defense mechanism called PromptKeeper to address the security concerns surrounding system prompts that guide the output of large-scale language models (LLMs). System prompts often contain business logic and sensitive information, making them vulnerable to exploitation of LLM vulnerabilities through malicious or common user queries. PromptKeeper addresses two key challenges: reliably detecting prompt leaks and mitigating side-channel vulnerabilities when leaks occur. By framing leak detection as a hypothesis testing problem, it effectively identifies both explicit and subtle leaks. When a leak is detected, it regenerates responses using dummy prompts, making them indistinguishable from normal interactions without leaks. Consequently, it provides robust protection against prompt extraction attacks via malicious or common queries, while maintaining the conversational capabilities and execution efficiency of typical user interactions.

Takeaways, Limitations

Takeaways:
Providing effective solutions to security threats in LLM system prompts.
Provides strong defense against both malicious attacks and common user queries.
Presenting an efficient mechanism for prompt leak detection and mitigation.
Maintain conversational skills and execution efficiency
Limitations:
Further evaluation of the performance and stability of the proposed defense mechanism in real environments is needed.
Generalizability verification is needed for various types of LLM and attack techniques.
Optimization and security enhancement of dummy prompt generation strategy are needed.
Additional overhead and performance degradation that may occur when applied to actual systems need to be analyzed.
👍