Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs

Created by
  • Haebom

Author

Aayush Gupta

Outline

This paper presents a novel security architecture, Contextual Integrity Verification (CIV), to address the vulnerability of large-scale language models (LLMs) to prompt injection and related jailbreak attacks. CIV works by attaching a cryptographically signed source label to each token and enforcing a source trust hierarchy within the Transformer via a pre-softmax hard attention mask. This ensures deterministic non-interference between tokens in the frozen model and prevents low-trust tokens from influencing high-trust representations. Experimental results demonstrate that CIV achieves a 0% attack success rate on benchmarks based on state-of-the-art prompt injection attack vectors, while maintaining 93.1% token similarity and exhibiting no degradation in model perplexity under normal operation. Application results on Llama-3-8B and Mistral-7B are also presented, and a reference implementation, an automated verification tool, and the Elite-Attack corpus are made public to support reproducible research.

Takeaways, Limitations

Takeaways:
We present an effective defense mechanism against prompt injection attacks.
It can be applied as a lightweight patch to existing models (no fine-tuning required).
Provides deterministic and token-level non-interference guarantees.
Minimizes performance degradation while maintaining high security.
We make materials available for reproducible research.
Limitations:
Latency overhead exists due to non-optimized data paths.
Performance is guaranteed only within the presented threat model.
Additional performance verification in various actual environments is required.
👍