[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems

Created by
  • Haebom

Author

Andrii Balashov, Olena Ponomarova, Xiaohua Zhai

Outline

This paper presents a comprehensive study of emerging security threats faced by large-scale language models (LLMs) deployed in enterprise environments (e.g., Microsoft 365 Copilot), specifically multi-round prompt inference attacks. We simulate realistic attack scenarios in which an attacker exploits LLMs integrated into enterprise sensitive data (e.g., SharePoint documents or emails) using questions that do not reveal malicious intent and indirect prompt injection. We develop and analyze a formal threat model for multi-round inference attacks using probability theory, an optimization framework, and information-theoretic leakage bounds. We show that the attacks reliably leak sensitive information in the context of LLMs even when standard safeguards are implemented. In this paper, we propose and evaluate defense techniques including statistical anomaly detection, fine-grained access control, prompt hygiene techniques, and architectural modifications to the LLM deployment. Each defense is supported by mathematical analysis or experimental simulations. For example, we derive bounds on information leakage in differential privacy-based training and show an anomaly detection method that flags multi-round attacks with high AUC. We also introduce an approach called “spotlighting” that uses input transformations to isolate untrusted prompt content and reduce attack success rates by a factor of 10. Finally, we provide a formal proof of concept and empirical validation of a combined defense-in-depth strategy. This study highlights that securing LLM in enterprise environments requires moving beyond single-shot prompt filtering to a holistic, multi-stage view of both attack and defense.

Takeaways, Limitations

Takeaways:
Provides a comprehensive threat model and analysis of multi-stage prompted inference attacks on LLM in an enterprise environment.
Propose and evaluate various defense techniques (statistical anomaly detection, fine-grained access control, prompt hygiene, architectural modifications, etc.).
Deriving boundaries for information leakage under differential privacy-based training.
Reduced attack success rate through new defensive techniques such as “spotlighting”.
Demonstrating the effectiveness of a combined defense-in-depth strategy.
Emphasizes the importance of a multi-level security approach beyond single-turn prompt filtering.
Limitations:
Research results from a simulated environment rather than a real business environment.
Further research is needed on the practical applicability and performance of the proposed defense technique.
Generalizability to different types of LLMs and attack strategies needs to be examined.
Further analysis is needed on the performance degradation of defense techniques.
👍