This paper discusses the latest AI developments that use multiple layers of safeguards to protect against catastrophic misuse of state-of-the-art AI systems. We note that the security of safeguard pipelines from several developers, including Anthropic’s Claude 4 Opus model, is unclear, and that there is a lack of prior research on evaluating and attacking them. This paper aims to address this gap by developing an open-source defense pipeline and red-teaming it. We develop a novel few-shot prompt-based input and output classifier that outperforms the existing state-of-the-art safeguard model, ShieldGemma, and present a novel attack technique called Staged Attack (STACK) that achieves a significant success rate even in a black-box environment. Finally, we present mitigations that developers can use to prevent staged attacks.