Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

AEGIS: Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema

Created by
  • Haebom

Author

Ting-Chun Liu, Ching-Yu Hsu, Kuan-Yi Lee, Chi-An Fu, Hung-yi Lee

Outline

Prompt injection attacks pose a serious challenge to the secure deployment of large-scale language models (LLMs) in real-world applications. To address this issue, the authors propose AEGIS, an automated co-evolutionary framework for defending against prompt injection attacks. Attacking and defending prompts are iteratively co-optimized using a text gradient optimization (TGO) module, leveraging feedback from an LLM-based evaluation loop. On real-world task scoring datasets, AEGIS consistently outperforms existing baselines, achieving superior robustness in both attack success rate and detection.

Takeaways, Limitations

Takeaways:
The automated co-evolution framework AEGIS provides a robust defense strategy against prompt injection attacks.
It achieves better performance than existing methods through automatic evolution of attack and defense prompts.
Improved attack success rate (ASR) (1.0 achieved) and improved detection performance (TPR 0.84, TNR 0.89)
We demonstrate the importance of coevolution, gradient buffering, and multi-objective optimization.
It has been proven effective in various LLMs.
Limitations:
The specific Limitations is not specified in the paper.
👍