This paper addresses the vulnerabilities of policy-compliant, task-oriented LLM-based agents in environments where policy compliance is crucial (e.g., refund eligibility, cancellation policies). To assess the agent's robustness against policy violation attempts by malicious users, we propose CRAFT, a multi-agent red team system that attacks policy-compliant agents using policy-aware persuasion strategies. Building on the existing tau-bench benchmark, we introduce tau-break, a complementary benchmark designed to rigorously assess the agent's robustness against manipulative user behavior. We then evaluate several defense strategies to demonstrate its limitations.