Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Effective Red-Teaming of Policy-Adherent Agents

Created by
  • Haebom

Author

Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor

Outline

This paper addresses the vulnerabilities of policy-compliant, task-oriented LLM-based agents in environments where policy compliance is crucial (e.g., refund eligibility, cancellation policies). To assess the agent's robustness against policy violation attempts by malicious users, we propose CRAFT, a multi-agent red team system that attacks policy-compliant agents using policy-aware persuasion strategies. Building on the existing tau-bench benchmark, we introduce tau-break, a complementary benchmark designed to rigorously assess the agent's robustness against manipulative user behavior. We then evaluate several defense strategies to demonstrate its limitations.

Takeaways, Limitations

Takeaways:
We clearly identify security vulnerabilities in policy-compliant LLM agents and provide new benchmarks (tau-break) and a red team system (CRAFT) to assess them.
Evaluates the effectiveness of defense strategies against various attack strategies of malicious users and highlights the need for more robust defense mechanisms.
We present a more effective policy violation attack method than existing DAN prompts, emotional manipulation, and coercion methods.
Limitations:
The proposed defense strategies are not perfect solutions, suggesting that more robust, research-based safeguards are needed.
Further research is needed to determine the generalizability of CRAFT and tau-break and their applicability to various domains.
Because it is focused on a specific environment (customer service scenario), further research is needed to generalize it to other domains.
👍